Skip to content

RaceLink Developer Guide

Checklists for the recurring "I want to add X" tasks. Each checklist walks you through every file that needs an update so a feature addition doesn't accidentally land half-implemented (the sendGroupControl ghost-method incident, where a renamed method hid behind a broad except for over a year, is the cautionary tale here).

For the why and the wire format, see:

Adding a new scene-action kind

Scene actions are the building blocks of a scene (e.g. wled_preset, startblock, delay, sync, offset_group). The kind name is the canonical identifier across the validator, runner, and editor.

Files to touch (in dependency order):

  1. Constant in racelink/services/scenes_service.py:
    KIND_MY_NEW_KIND = "my_new_kind"
    ALL_KINDS = (..., KIND_MY_NEW_KIND)
    
  2. Validator in the same file: add a _canonical_my_new_kind_action helper if the action has a non-trivial shape, and dispatch to it from _canonical_action. If your kind requires a target, validate it via the existing _canonical_target (group / device).
  3. Editor schema in get_action_kinds_metadata(): declare the kind with its vars (UI inputs), supports_flags_override, etc. The WebUI consumes this to render the action body.
  4. Dispatch plan in racelink/services/dispatch_planner.py: add a branch in plan_action_dispatch (or extend _plan_effect) that produces one WireOp per wire packet the action would emit. Each op carries sender (the symbolic adapter key — e.g. "send_control"), payload (kwargs ready to spread into the named sender), and body_bytes sized via the canonical builder in racelink/protocol/packets.py. This is the single source of truth — the runner and the cost estimator both consume the resulting plan, so a kind is "done" once its planner branch is correct.
  5. Runner adapter in racelink/services/scene_runner_service.py: if your kind needs a new symbolic sender (uncommon — most kinds re-use send_control / send_wled_preset / send_offset / send_sync), add the mapping in _dispatch_op. Otherwise just register the per-kind shim:
    def _run_my_new_kind(self, index, action, started):
        return self._plan_and_execute(KIND_MY_NEW_KIND, index, action, started)
    
    The cost estimator picks up the new kind automatically — no changes there.
  6. Capability mapping in racelink/static/scenes.js (requiredCapForKind): if the new kind requires a device capability (WLED / STARTBLOCK / etc.), return the cap string. Without this entry the editor will not filter target dropdowns for the new kind — and you'll re-introduce the silent-success bug class C5 closed.
  7. Frontend rendering in scenes.js:
  8. Add KIND_MY_NEW_KIND (or just the string literal) to SCENE_KIND_LABELS and SCENE_KINDS_ORDER.
  9. If the kind has parameters, add them to the editor schema in step 3 — the generic buildVarsRow will render them. Custom widgets (e.g. the offset_group config panel) need their own buildXyz function.
  10. defaultActionForKind(kind): return the seed shape for the editor's "+ Add" button.
  11. Tests in tests/test_scenes_service.py (validator round-trip, edge cases) and tests/test_scene_runner_service.py (dispatch happy path + transport-missing degraded path). If the kind has cost characteristics worth pinning, also add a test_scene_cost_estimator.py test.
  12. Plan-file note if the addition is significant: append to the active plan at the maintainer's internal engineering ledger so the rationale stays linked to the change.

Checklist:

[ ] KIND_* constant in scenes_service.py
[ ] _canonical_*_action validator (if non-trivial shape)
[ ] get_action_kinds_metadata entry
[ ] dispatch_planner.py branch (the single source of truth — runner + estimator both consume it)
[ ] scene_runner_service.py: _dispatch_op mapping if a new sender is needed; per-kind shim that delegates to _plan_and_execute
[ ] scenes.js requiredCapForKind entry (if cap-gated)
[ ] scenes.js SCENE_KIND_LABELS + SCENE_KINDS_ORDER
[ ] scenes.js defaultActionForKind seed
[ ] tests/test_dispatch_planner.py — pin the planner output for the new kind
[ ] tests/test_dispatch_parity.py — runner + estimator agree on packet count & per-op sender
[ ] tests for validator + runner-side adapter (degraded paths, etc.)
[ ] plan-file note (if significant)
[ ] manual smoke: editor renders the kind, save+load round-trips, run produces the expected wire trace

Adding a new wire opcode

Adding an opcode means changing the wire format — coordinate across all three repos (Host, Gateway, WLED). The tests/test_proto_header_drift.py test will fail otherwise.

The catalog headers ride with racelink_proto.h. Two WLED-neutral catalog headers — racelink_headless.h (scene catalog) and racelink_indicators.h (indicator catalog) — are distributed alongside racelink_proto.h and must stay byte-identical across all four component repos. Drift in any of them counts as a wire-format break the same way racelink_proto.h drift does, since the symbolic ids carried by OPC_HEADLESS and OPC_INDICATE are looked up against the receiver's local copy of the catalog. Both files should be added to the drift-test equivalence list as the Host repo grows them.

Files to touch:

  1. C header racelink_proto.h:
  2. Add the value to the LP enum (OPC_*).
  3. Document the body layout and response policy in a comment block above the matching struct (see OPC_OFFSET for the reference style).
  4. Add the matching static const uint8_t MAX_P_* for any variable-length body, plus a static_assert(MAX_P_* <= BODY_MAX).
  5. Add a PacketRule entry in RULES[] (direction + response policy + max body length).
  6. Mirror in Gateway + WLED firmware repos: copy the updated racelink_proto.h byte-identically to ../RaceLink_Gateway/src/racelink_proto.h and ../RaceLink_WLED/racelink_proto.h. Verify with pytest tests/test_proto_header_drift.py.
  7. Auto-generated Python mirror racelink/racelink_proto_auto.py: re-run python gen_racelink_proto_py.py to regenerate. Don't hand- edit the generated file.
  8. Body builder in racelink/protocol/packets.py: add build_my_new_opc_body(...). Return the body bytes (without the Header7); the framing code wraps it.
  9. Reply parser in racelink/protocol/codec.py: if the opcode has a reply (RESP_ACK or RESP_SPECIFIC), add the parse path. The dict shape returned is the event the listeners see.
  10. Per-opcode rule in racelink/protocol/rules.py: if you didn't include this in step 1's regen, add manually.
  11. Transport entry-point in racelink/transport/gateway_serial.py: add send_my_new_opc(...) that calls _send_m2n with the matching LP.make_type(LP.DIR_M2N, LP.OPC_MY_NEW) and the body from step 4.
  12. Service wrapper if the opcode needs orchestration (retries, reply collection, post-ACK state mutation): add a method to the appropriate service in racelink/services/, typically gateway_service.py (high-level dispatch) or a dedicated service if the surface is large enough.
  13. Tests in tests/test_protocol.py for the body builder + parser, and in the matching service test file for the orchestration.
  14. Documentation in wire-protocol.md: add the opcode to the table and a body-layout subsection. The header is the source of truth, but the doc is what people read.

Checklist:

[ ] racelink_proto.h: OPC_* + PacketRule + struct/comment
[ ] Mirror to Gateway + WLED repo (byte-identical)
[ ] Re-run gen_racelink_proto_py.py
[ ] build_*_body in protocol/packets.py
[ ] reply parse path in protocol/codec.py (if reply expected)
[ ] transport.send_* in transport/gateway_serial.py
[ ] service wrapper (orchestration, retries, post-ACK)
[ ] tests/test_protocol.py round-trip
[ ] tests/test_<service>.py orchestration
[ ] tests/test_proto_header_drift.py passes (no manual change needed; just run it)
[ ] PROTOCOL.md: opcode table + body layout
[ ] firmware-side handlers in Gateway + WLED

Adding a new Headless scene to the catalog

The Headless-Mode scene catalog lives in racelink_headless.h::SCENE_CATALOG[] and is consumed by every receiver — adding a row means flashing every node that needs to display it. Older firmware silently drops unknown scene ids via findSceneById() == nullptr, so a partial roll-out is safe (mixed-firmware fleets just see the new scene only on up-to-date nodes).

Files to touch:

  1. Catalog header racelink_headless.h:
  2. Append a new SCENE_* value to the HeadlessSceneId enum. Append-only — never reuse or renumber existing ids; they are wire-stable.
  3. Append a row to SCENE_CATALOG[] carrying the visual spec (fxMode, speed, intensity, color1) plus the offset formula if the scene staggers across groups (SCENE_FLAG_USE_OFFSET + offsetMode, offsetBase, offsetStep).
  4. Mirror to all three component repos byte-identically: Host, Gateway, WLED. Drift here is as serious as racelink_proto.h drift (the receiver expects to look up the id against its local table).
  5. Firmware-side expansion in usermods/racelink_wled/racelink_wled.cpp ::applyLocalScene — usually no code change needed; the catalog row drives the segment writes generically. Only add a code path when the scene needs a non-standard semantic (e.g. SCENE_RESTORE_BOOT_COLOR uses a per-device boot snapshot that doesn't live in the catalog row).
  6. Headless-master broadcast — none needed; the WLED's single-click cycles the catalog by index, so a new row is picked up automatically.
  7. Doc in RaceLink_WLED/operator-setup.md §"Headless Mode" → Scenes table: add a row.

Persisting Headless-Master state across reboots

The Headless Master keeps a small amount of per-master state in cfg.json so a power-cycle (or battery swap) does not lose the pairing context. All fields live under RaceLink.overrides; the operator-visible reference is RaceLink_WLED/headless-mode.md §"Persistence".

Cardinal rules for changing or adding a persistence field:

  1. One save path per concern. A new persistent field should either (a) trigger configNeedsWrite = true synchronously (rare operator action, "save now is the right UX") or (b) plug into the debounce pump. Mixing is a bug.
  2. Debounce pairing-burst writes. Anything that can mutate tens of times per minute during normal use (most notably the Headless Slaves registry mutations) MUST funnel through markHeadlessPersistDirty()serviceHeadlessPersist(now) instead of setting configNeedsWrite directly. The 5-second debounce window (HEADLESS_PERSIST_DEBOUNCE_MS) collapses a 40-slave pairing burst into a single save. The LittleFS partition has ~120 000 saves of headroom; without the debounce a heavy-event-day operator burns through that budget in months.
  3. exitHeadlessMode() writes synchronously and wipes everything. The operator gesture for exit must survive an immediate battery pull, so the function clears headlessPersistDirty, mutates the relevant overrides (counter → 0, registry → empty, current.groupId → 0, headlessPersistedActive → false), and sets configNeedsWrite = true in one synchronous pass. Runtime- override paths (Gateway takeover, autosync detection) leave the registry intact — they are involuntary demotions where later manual re-promotion benefits from the preserved data.
  4. Proactive use of the registry. On promotion (whether by 5-click or by auto-resume), enterHeadlessMode() calls startHeadlessReassign() which arms the cursor in the RaceLinkHeadless::ReassignState struct; the loop pump serviceHeadlessReassign(now) sweeps the registry with one OPC_SET_GROUP per HEADLESS_REASSIGN_INTERVAL_MS (currently 500 ms — tuned to give the addressed slave time to CAD + ACK before the next master TX). A 40-slave sweep takes ~20 s; the operator sees discrete IND_PAIRING_TX flashes per slave plus IND_PAIR_CONFIRMED on the receiving end. If the TX queue is busy (scheduleSend returns false — typically the post-promotion scene/SYNC broadcast still in flight), the sweep retries the same slot on the next interval (deferReassignRetry) instead of advancing — never silently drops a slave.
  5. Scene rebroadcast after pairing. After a successful SET_GROUP (proactive boot-burst OR individual reactive pairing), headlessAssignGroupTo() and the end of serviceHeadlessReassign() both call scheduleSceneRebroadcast() which arms RaceLinkHeadless::SceneRebroadcastState with a 1 s debounce. Successive arms within the window collapse to one OPC_HEADLESS packet; the loop pump serviceSceneRebroadcast(now) fires it once the deadline elapses. No-op when the master has no current scene yet (currentSceneIdx == 0xFF).
  6. Group-id discipline. HEADLESS_MASTER_GROUP_ID = 1 is the master's own group while active; HEADLESS_FIRST_GROUP_ID = 2 is the first id ever handed to a slave; 0 is the unconfigured pool; 255 is the broadcast pseudo-group. Use the header helper RaceLinkHeadless::reserveNextGroupId(counter) to pull the next free id — it clamps and exhausts correctly. Passing groupId = 0 to buildSetGroupPacket() is a bug.
  7. Master self-sync invariant. The master's own strip.timebase must equal -activePhaseOffsetMs for its own segment effects to render in the same logical-time frame the slaves derive from incoming OPC_SYNC packets. Without this invariant the master drifts on offset scenes while slaves stay synchronised with each other. Re-asserted at every headlessBroadcastSync() and once at enterHeadlessMode().

Where the headless state lives

All WLED-neutral headless state structs + helper functions live in racelink_headless.h under the RaceLinkHeadless:: namespace — reusable byte-identically by external Gateway-side software (e.g. FPVGate). Notable members:

Header export Purpose
HeadlessSlaveRec + findSlaveIdx / upsertSlave / clearSlaveTable persistent slave registry, pure data ops on a caller-owned array
PersistState + markPersistDirty / persistDebounceElapsed / persistConsumed debounced flash-write pump state machine
SceneRebroadcastState + scheduleSceneRebroadcast / sceneRebroadcastReady / sceneRebroadcastConsumed post-pairing rebroadcast scheduler
ReassignState + startReassign / pickReassignTarget / reassignSweepCompleted / confirmReassignSent / deferReassignRetry / abortReassign re-bind cursor state machine
shouldFirePairingBlip(lastAtMs, now, throttleMs) indicator-throttle decision
reserveNextGroupId(counter) counter clamp + bump + exhaustion check
HEADLESS_* constants timing, interval, throttle, debounce parameters

The WLED-coupled side (enterHeadlessMode / exitHeadlessMode / headlessBroadcastSync / headlessBroadcastCurrentScene / headlessSendTx / headlessAssignGroupTo / serviceHeadless probe state machine) stays in racelink_wled.{h,cpp} because it touches strip.timebase, bri, applyLocalIndicator, configNeedsWrite, and the segment write API. The loop-pump methods on UsermodRaceLink (serviceHeadlessPersist, serviceSceneRebroadcast, serviceHeadlessReassign) are thin wrappers — they consult the header decision helpers and execute the WLED-side action.

Time-critical TX via scheduleSend(rl, buf, len, jitterMaxMs=0)

RaceLinkTransport::scheduleSend() in racelink_transport_core.h is the single TX-queue entry point shared byte-identically by Gateway, Host and WLED. Its jitterMaxMs parameter has three modes:

jitterMaxMs lbtEnable Behavior
== 0 (any) Time-critical bypass: fire immediately, no random delay, no CAD scan. Use for low-frequency broadcasts where the in-packet timestamp must reflect the actual TX moment within single-digit ms.
> 0 true LBT-polite: 50..300 ms random pre-delay (capped) + CAD scan, retries with backoff if busy.
> 0 false Caller-controlled jitter, no CAD. Gateway-style — sole TXer, no spectrum-sharing needed but a small skew helps host-driven burst timing.

Canonical use cases for the jitterMaxMs=0 bypass:

  • OPC_SYNC keepalive: the slaves' drift-correction quality is dominated by the precision of the ts24 timestamp in the SYNC body vs. the actual TX moment. With LBT's 50..300 ms random delay between caller's millis() sample and the actual TX, slaves' lastSyncTbErrMs inflates to ~250 ms. With the bypass, the same metric stays in the Gateway-baseline range (~15 ms). Trade-off: skips collision avoidance, occasional loss tolerable (next SYNC re-anchors).
  • Stream fragments: existing Gateway behaviour — fragments must back-to-back to avoid the receiver de-fragmenter timing out, so inter-packet jitter would break the stream.

The trade-off is collision avoidance: with jitterMaxMs=0 the TX fires the moment the radio leaves Standby, regardless of whether another node is mid-transmission on the channel. Reserve this path for sends where either occasional loss is acceptable (SYNC retries every 30 s anyway) or the sender knows it's the only TXer on its side (Gateway).

Migrated 2026-05-19: the bypass branch lives directly in scheduleSend() itself, replacing the older Gateway-side rl_queueTxNoCad() toggle workaround that flipped lbtEnable to false around the call. The previous WLED-side scheduleSendNoLbt() parallel function (introduced briefly) was also removed in the same unification pass. Cross-repo invariant: any change to scheduleSend() must be replayed byte-identically into the Gateway and Host copies of racelink_transport_core.h so tests/test_proto_header_drift.py stays green.

Adding a new Indicator to the catalog

The status-indicator catalog lives in racelink_indicators.h::INDICATOR_CATALOG[]. Same drift-discipline as the scene catalog. Existing receivers silently drop unknown indicator ids — forward-compatible.

Files to touch:

  1. Catalog header racelink_indicators.h:
  2. Append a new IND_* value to IndicatorType. Append-only.
  3. Append a row to INDICATOR_CATALOG[]. Animated only — fxMode must be BREATH (3), STROBE (23), or another animated mode. STATIC (0) violates the project's animation rule for indicators. Avoid pure RGB / W for the same reason.
  4. Mirror to all three component repos byte-identically.
  5. Trigger — either local (applyLocalIndicator(IND_*, dur) in firmware) or wire (Host / Gateway emits OPC_INDICATE(type=IND_*, durationSec=...)). The duration is per-trigger, not in the catalog row, so the same indicator can run for 3 s in one context and 30 s in another.
  6. Sub-second triggers — for indicators that need finer than 1 s resolution (e.g. IND_PAIRING_TX which fires per SET_GROUP send with a 1500 ms display window), call the millisecond variant applyLocalIndicatorMs(IND_*, durationMs) instead. Same semantics otherwise; the only difference is the deadline math.
  7. High-frequency triggers must throttle. A trigger that fires more often than once per ~200 ms should gate itself with a lastTriggerAtMs timestamp so consecutive triggers do not re-extend the indicator deadline into a sustained overlay. See headlessSendTx() in racelink_wled.cpp for the canonical pattern (used to drive IND_PAIRING_TX on the Headless Master for SET_GROUP sends only).
  8. Doc in RaceLink_WLED/operator-setup.md §"Indicators" → Catalog table: add a row. Update the trigger column with the new locally-fired site (if any) or note that the indicator is "wire-only" if no local code path fires it.

Adding a new service

A service is a stateless or small-stateful module under racelink/services/ that owns one coherent piece of host logic.

Files to touch:

  1. Module at racelink/services/my_service.py:
  2. Module docstring (5–15 lines): purpose, public API, dependencies, threading expectations. Use gateway_service.py as the template.
  3. Module logger: logger = logging.getLogger(__name__).
  4. Class MyService with __init__(self, controller, gateway_service) (or whatever dependencies it needs).
  5. Public methods that return useful values (bool for send-style operations, dicts for query operations, raise ValueError for bad input).
  6. Service init in racelink/services/__init__.py: re-export the class.
  7. Wire-up in controller.py::__init__:
    self.my_service = MyService(self, self.gateway_service)
    
  8. Web routes in racelink/web/api.py if the service is operator-facing: route handler that validates input via request_helpers.require_int (or similar), calls the service, returns the response. Match the existing try / except RequestParseError → 400 and try / except Exception → 500 with type+traceback log patterns.
  9. Tests at tests/test_my_service.py:
  10. Unit tests with a fake controller / fake transport.
  11. Coverage for the boolean return contract (transport-missing returns False; happy path returns True).
  12. Coverage for any error paths.
  13. Architecture doc at architecture.md: add a row to the Service Layer table.

Checklist:

[ ] racelink/services/my_service.py with module docstring + logger
[ ] services/__init__.py re-export
[ ] controller.py wiring
[ ] web/api.py route(s) (if operator-facing)
[ ] tests/test_my_service.py
[ ] ARCHITECTURE.md service-table entry

Adding a new task-manager-driven workflow

Long-running ops (multi-second, multi-stage) live in racelink/web/tasks.py so the web request returns immediately and the UI can subscribe to SSE task events for progress.

Files to touch:

  1. Service method that does the work (likely a new service per "Adding a new service" above, or a method on an existing one). The method must accept a task_manager parameter and call task_manager.update(meta={"stage": "...", "message": "...", ...}) at every stage transition. The meta shape is free-form but the existing operator-facing UI expects:
  2. stage — short uppercase tag (e.g. HOST_WIFI_ON, UPLOAD_FW).
  3. message — one-line operator-readable description.
  4. index / total — for per-device fan-outs.
  5. addr — current MAC if applicable.
  6. Web route in web/api.py: validate input, then ctx.tasks.start("my_task_name", target_fn, meta={...}) where target_fn is a closure that calls the service method with the task manager. Return {"ok": True, "task": ctx.tasks.snapshot()}.
  7. Frontend handler in racelink/static/racelink.js::updateTask: add a branch for name === "my_task_name" that updates the UI from the meta. Long-running ops with their own dialog (FW update is the reference) keep the dialog open and render the progress in-dialog; simpler ops can rely on the masterbar's taskDetail span.
  8. Tests: at minimum verify the route returns immediately ({"ok": True, "task": {...running...}}); deeper integration tests can exercise the meta-update path with a mock task manager.

Making the workflow cancellable

The TaskManager exposes a cooperative-cancel API: a worker thread polls task_manager.is_cancel_requested() at safe-to-stop points, the web layer flips the flag via POST /api/task/cancel. The pattern is shipped end-to-end for the firmware-update and presets-download flows (see racelink/services/ota_workflow_service.py); follow the same shape for new workflows that are >5 s long or touch operator-affecting state (host Wi-Fi, multi-device fan-outs).

  1. Pick the cancel granularity. Two flavours work:
  2. After-current-unit (preferred for multi-device fan-outs): check the flag at loop-entry only. The currently-running unit finishes cleanly; remaining units are skipped. Never produces a half-completed unit. Used by run_firmware_update — the per-device flash + verify + reconnect always runs to completion once it started.
  3. Mid-step (only when half-state is benign): check the flag before every long sleep / network call. Used by download_presets for the AP-connect step where cancelling before the HTTP GET costs the operator nothing.
  4. Always run cleanup in finally. The cancel flag must not short-circuit Wi-Fi restore, device-state reset, or any other "we changed external state, now put it back" step. The reference pattern (run_firmware_update):
    try:
        _ensure_wifi_ready(...)
        for idx, addr in enumerate(macs, start=1):
            if task_manager.is_cancel_requested():
                result["cancelled"] = True
                result["cancelled_after"] = idx - 1
                break
            # per-device work
    finally:
        self._restore_host_wifi(result, ...)   # ALWAYS runs
    
  5. Extend the result shape. Add cancelled: bool and (for fan-outs) cancelled_after: int|None so the dialog's summary panel can render an honest "stopped after device N of M" line. Existing consumers ignore unknown keys, so this is additive.
  6. Frontend: cancel button + summary phase. See the next section ("Modal-locked dialogs"). The button calls gateway.cancelTask() from the Pinia store; the dialog flips to a summary phase when the task lands in done/error.
  7. Tests. Mirror tests/test_ota_workflow_service.py::FwUpdateCancelTests: a _RecordingTaskManager with a programmable cancel-after-N counter, three tests pinning before-first, after-first, and after-all (no-op) behaviour. Verify result["cancelled"] and that the finally cleanup ran.

Long-running operations that change host state the operator cannot recover on their own (host Wi-Fi switched to a device AP, multi-device flash in flight, …) need to keep their dialog visible until the work either finishes naturally or is cancelled with a summary. The infrastructure is shipped — three call sites use it today: firmware-update, presets-download, discovery.

Components

Layer Where What it does
Dialog prop frontend/src/components/ui/dialog/DialogContent.vue lockClose: boolean. When true: @interactOutside.prevent, @escapeKeyDown.prevent, X button hidden.
Composable frontend/src/composables/useTaskNavigationGuard.ts Wraps useBeforeUnloadGuard + onBeforeRouteLeave. Native confirm with caller's reason string while the task runs.
Store action frontend/src/stores/gateway.ts (cancelTask) POST /api/task/cancel + optimistic local cancel_requested = true.
Store computeds frontend/src/stores/gateway.ts (fwBusy, presetsBusy, discoverBusy) Per-task-name busy flags. Add one for each new long-running workflow.

Wiring a new long-op dialog

  1. Add a busy computed to the gateway store keyed on your task.name:
    const myBusy = computed(
      () => task.value.name === 'my_task_name' && task.value.state === 'running',
    )
    
  2. Lock the dialog while busy:
    <DialogContent :lock-close="gateway.myBusy">
    
  3. Install the navigation guard in <script setup>:
    useTaskNavigationGuard(() => gateway.myBusy, {
      reason: 'A <verb> is currently running. Leaving now will lose status visibility — continue anyway?',
    })
    
  4. Cancel button: visible only while myBusy, disabled when task.cancel_requested (server has accepted the cancel; we're waiting for the worker to wind down). Label flips to "Cancelling…".
  5. Summary phase: a phase ref with 'config' | 'progress' | 'summary'. A watcher on task.state flips to 'summary' when the running task lands in done or error. The summary panel reads task.result and renders success / failure / skipped counts plus any workflow-level side-effect status (e.g. hostWifi.restored). Only the summary phase exposes a real Close button.
  6. Defensive close-on-Close: the in-dialog Close button must be :disabled="myBusy" so the operator cannot dismiss the dialog even via the Close button while the task is running. The lockClose prop guards outside-click / Esc; the disabled Close button guards the obvious other path.

When to use the full pattern vs. the lighter variant

  • Full pattern (lock + Cancel button + summary): long-running workflows that change reversible-but-operator-affecting state. Firmware update, presets download, future "bulk reflash group" or multi-device migration. Anything that touches host Wi-Fi qualifies automatically because Wi-Fi-restore failure strands the operator.
  • Lighter variant (lock only, no Cancel): short ops (<30 s) with no Wi-Fi / no half-state risk where Cancel is overkill but outside-click dismiss is still annoying. Discovery sweep is the reference. The footer Close button is :disabled="<busy>" so the operator either waits for natural completion or accepts the navigation-guard prompt to leave.

Modifying threading-sensitive code

Anything that touches the gateway, the device repository, or the SSE layer crosses a thread boundary. Before submitting:

  • Read architecture.md §Threading Model. Confirm which thread your code runs on.
  • Confirm the lock contract: if you're mutating shared state, use the existing locks (state_repository.lock, _pending_config_lock, _pending_expect_lock, _tx_lock). If you're adding a new shared field, add a matching lock — and add a regression test in tests/test_state_concurrency.py pinning the contract.
  • Never hold state_repository.lock across RF I/O — see the locking-rule note in ARCHITECTURE.md. The reference pattern is _apply_device_meta_updates in api.py.
  • Name your threads (name="rl-<purpose>"). This is a project convention; new threads without a name pollute threading.enumerate() output and make py-spy traces illegible.
  • Daemon threads only via ThreadPoolExecutor when you can bound concurrency (see gateway_service._auto_restore_executor for the reference). One-shot daemons are still acceptable for truly singleton tasks (the RX reader, the reconnect worker).

Common patterns

Adding a request_helpers.require_int-style validator

Cross-cut input validation lives in racelink/web/request_helpers.py. The pattern: a helper raises RequestParseError (a ValueError subclass) on bad input; the route catches it once and translates to a 400. Adding a new helper:

def require_mac(body, key, *, label=None):
    name = label or key
    raw = require_str(body, key, label=name)
    raw = raw.strip().upper()
    if not _MAC12_RE.match(raw):
        raise RequestParseError(f"{name} must be a 12-char hex MAC")
    return raw

Then add a test in tests/test_web_request_helpers.py matching the existing RequireIntTests style.

Adding a # swallow-ok: annotation

The exception-hygiene test (tests/test_exception_hygiene.py) requires every except Exception block to either log, re-raise, or carry a # swallow-ok: <reason> comment. The reason should be substantive — "best-effort fallback; caller proceeds with safe default" is the bare minimum, but a one-line why is better.

If you're tempted to swallow at an RF/persistence boundary, prefer a logger.warning(..., exc_info=True) over a silent pass. A previous project-wide sweep went through every broad except in the project; aim to match that quality on new code.

Returning a boolean from a send_* method

Every send_* method on control_service returns bool. True means "the transport accepted the frame for queueing"; False means "transport not ready / no target / nothing went out". The a project-wide review traced silent-success bugs back to methods that returned None instead. New send-style methods follow the contract:

def send_my_new_opc(self, ...) -> bool:
    transport = self._require_transport("sendMyNewOpc")
    if transport is None:
        return False
    transport.send_my_new_opc(...)
    return True

Regenerating WLED metadata after a firmware bump

Three RaceLink modules under racelink/domain/ are fully auto-generated from the WLED checkout by gen_wled_metadata.py. They must never be hand-edited; the file headers say so and git blame will land on the generator script, not a human commit.

Generated file Source in WLED checkout What it carries
wled_effects.py wled00/FX.h (effect IDs) + wled00/FX.cpp (_data_FX_MODE_*[] strings) Per-effect slot metadata: which sliders/toggles/colors/palette an effect uses, plus custom labels ("Bg", "Duty cycle", …).
wled_palettes.py wled00/FX_fcn.cpp (JSON_palette_names[]) Palette id → display name.
wled_palette_color_rules.py wled00/data/index.js (updateSelectedPalette()) The palette-conditional color slot rule: which built-in * Color… palettes (ids 2..5 in stock WLED) force-show extra color pickers regardless of effect metadata.

The generator parses each source file with regexes pinned to the upstream shape; if WLED ever reshapes one of them (e.g. moves updateSelectedPalette or changes its if (s > 1 && s < 6) guard), the generator raises RuntimeError with a pointer to the file/function it failed on, rather than silently producing wrong output. The rule extraction is also unit-tested in tests/test_wled_effect_metadata.py under ParsePaletteColorRuleTests.

When to regenerate

  • You bumped the bundled WLED firmware (the checkout under ../WLED LoRa/WLED).
  • A WLED contributor added/renamed/removed an effect (changes FX.h + FX.cpp).
  • A WLED contributor added/renamed/removed a built-in palette (changes FX_fcn.cpp).
  • A WLED contributor reshaped updateSelectedPalette() (changes the JS rule).

How

  1. Make sure the WLED checkout path matches what the generator expects. The default points at a maintainer-local path; always pass --wled <path> to override, pointing at the root of your RaceLink_WLED checkout (the directory containing wled00/).
  2. Run the generator:
    py gen_wled_metadata.py
    
    It prints one line per output file, e.g.:
    Wrote racelink\domain\wled_effects.py (188 effects)
    Wrote racelink\domain\wled_palettes.py (72 palettes)
    Wrote racelink\domain\wled_palette_color_rules.py (palette-color rule: ...)
    
    If any source-file shape check fails the script aborts with RuntimeError; read the message, update the relevant regex in gen_wled_metadata.py, and rerun.
  3. Run the parser tests:
    py -m pytest tests/test_wled_effect_metadata.py -q
    
    The ParsePaletteColorRuleTests::test_generated_module_matches_stock_thresholds pin will fire if the new firmware ships different palette thresholds — update the pin to match the new values and note the change in the WebUI smoke checklist below.
  4. (Optional but recommended) Smoke-test the RL-preset editor in a browser: open it, pick effect "Traffic Light", walk the palette dropdown, and confirm that color-slot visibility still matches WLED's own webui (* Color 1 → 1+Bg, * Color Gradient → 1+Bg+3, etc.).

How the generated data reaches the UI

WLED source ──► gen_wled_metadata.py ──► racelink/domain/wled_*.py
                            racelink.domain.specials.serialize_rl_preset_editor_schema()
                            GET /racelink/api/rl-presets/schema
                            racelink.static.racelink.js :: ensureRlPresetUiSchema()
                            buildRlPresetForm() consumes options[].slots and
                            schema.paletteColorRules to drive the editor

No JS-side hardcoding remains: paletteForcesSlot reads the rule from the schema (with a small literal fallback for the case where an old backend hasn't shipped the field yet, intentionally matching the stock values so behaviour is preserved during a rolling upgrade).

The deterministic-effects catalogue in wled_deterministic.py is the only WLED-derived module that is not auto-extracted — it encodes a hand-audited subset of FX.cpp per the workflow below.

WLED OTA gate matrix

The four gates that WLED's /update handler enforces — same-subnet, settings-PIN, OTA-lock, release-name — plus the five firmware-side options to ship same-subnet=false live in ../reference/wled-ota-gates.md. The recommendation is unchanged: ship the racelink_wled usermod override (Option 1) on new firmware images and keep the host-side auto-unlock (OTAService._wled_attempt_unlock, Option 5) as the safety net.

Host-side per-device cleanup contract

Three load-bearing semantics in racelink/services/ota_workflow_service.py that are easy to break when refactoring the per-device loop — AP-Enable retry shape (1.5 s × 2), conditional AP-Close (only on the error-after-AP-open path), and the two-track per-device error surface (dev_res["error"] + device_messages[addr_key]). The canonical wording lives in the module docstring at the top of ota_workflow_service.py so it travels with the code; the cleanup contract is the kind of constraint a refactor commit author needs to see while editing that file, not later via doc cross-reference.

If you're adding a step between AP-Enable and the success-path dev_res["ok"] = True that opens any other long-lived host state (a held lock, an external connection, etc.), the same try/except/finally pattern must clean it up. The reference implementation does this for the host's nmcli connection via _restore_host_wifi in the outer finally.

task_manager.snapshot() adds a top-level elapsed_s field (max(0, (ended_ts or now) - started_ts)) so the WebUI's live timer can anchor on the server-computed value instead of Date.now() / 1000 - started_ts, which would otherwise expose host-vs-browser clock skew. Any new long-running task gets the field for free; no per-workflow opt-in.

Updating the WLED-deterministic effects list

The RL-preset editor marks "deterministic" WLED effects with a leading * and sorts them to the top of the dropdown so operators picking offset-mode-safe effects see them first. Deterministic = the effect's pixel output depends only on synced inputs (strip.now + segment params), so two nodes with synchronised strip.timebase render identically. The audited set is in racelink/domain/wled_deterministic.py (currently 19 effects); the source-of-truth catalogue lives in the WLED fork at usermods/racelink_wled/docs/effects-deterministic.md (the same content is also available in this consolidation at reference/deterministic-effects.md).

When to update: a WLED release adds/changes an effect, or the catalogue doc grows a new "✓" entry.

How:

  1. Read the analysis doc, especially §"How to verify a new / unlisted effect". Apply its 5-step grep checklist to the effect's body in wled00/FX.cpp.
  2. If passes: add the numeric ID to WLED_DETERMINISTIC_EFFECT_IDS in wled_deterministic.py with an inline comment naming the effect + FX.cpp anchor.
  3. Update the pin test in tests/test_wled_effect_metadata.py::WledDeterministicTaggingTests::test_deterministic_id_set_matches_analysis — same ID + bump the len() assertion.
  4. py -m pytest tests/test_wled_effect_metadata.py -q should still pass.
  5. The frontend picks the change up automatically (no JS / CSS edit needed; backend ships the flag + the sort).

When removing: same flow in reverse — a WLED patch that introduces RNG / beat*-without-timebase / per-frame SEGENV.step accumulation into a previously-deterministic effect demotes it. Drop the ID from both wled_deterministic.py and the pin test; update the catalogue's table to move the effect from "✓" to "⚠ Looks deterministic but is not" with the new failure mode.

The full step-by-step workflow (including the rationale, the deterministic criteria, and the failure modes) lives in the module docstring of wled_deterministic.py itself — anyone editing the file sees it immediately.

Smoke-testing your change

Before opening a PR:

  1. py -m pytest -q — full suite must pass.
  2. node --check racelink/static/racelink.js and node --check racelink/static/scenes.js if you touched JS.
  3. py -m pytest tests/test_no_german_in_ui.py — confirms no accidental German strings in operator-facing UI.
  4. py -m pytest tests/test_proto_header_drift.py — if you touched racelink_proto.h.
  5. py -m pytest tests/test_exception_hygiene.py — confirms every except Exception you added is either logged or annotated.

For features the test suite can't fully cover (frontend behaviour, RF-level interactions), add a manual smoke checklist to your PR description. The internal engineering ledger contains good examples — every shipped batch ends with a list of "open the app, click X, confirm Y" steps.