RaceLink Architecture¶
Repository Scope¶
RaceLink_Host now contains only the host-owned parts of the system:
- core runtime wiring
- transport, protocol, state, and service layers
- the shared RaceLink WebUI
- standalone Flask hosting
RotorHazard-specific adapter code is no longer part of this repository. That adapter belongs in the separate RaceLink_RH-plugin repository.
Stable Host Entry Points¶
External adapters should depend on the host through these stable entry points:
racelink.app.create_runtime(...)racelink.web.register_racelink_web(...)
This keeps plugin repositories from reaching deeply into host internals.
Package Layout¶
racelink/app.pyRuntime container and host-owned runtime factory.racelink/core/Cross-cutting contracts and null source/sink defaults.racelink/domain/Device models, metadata, and specials helpers.racelink/protocol/Protocol constants, rule helpers, and packet support.racelink/transport/Serial gateway transport and framing.racelink/state/Runtime repositories and persistence helpers.racelink/services/Host business workflows.racelink/web/Shared RaceLink WebUI registration, API, SSE, DTOs, and task state.racelink/integrations/standalone/Canonical standalone Flask bootstrap using the same host runtime and WebUI.pages/andstatic/Shared RaceLink WebUI assets that remain in the host repository.
WebUI Hosting Model¶
There is one RaceLink WebUI.
- In standalone mode, the standalone Flask app mounts that UI through
register_racelink_web(...). - In RotorHazard mode, the external adapter plugin is expected to mount that same UI through the same host-owned registration entry.
- The packaged standalone user entrypoint is
racelink-standalone, which boots the host-owned standalone integration underracelink.integrations.standalone.
pages/ and static/ stay in the host repository so both hosting modes use the same UI implementation.
Layer Boundaries¶
domainstays framework-agnostic.protocolandtransportdo not depend on web-hosting concerns.stateowns repositories and persistence.servicesimplement host workflows and should not depend on external adapters.webadapts HTTP and SSE traffic to host services.integrations/standalonedepends inward on host modules and does not define separate UI behavior.
Service Layer¶
The host's business logic is split across the modules below. Each
service has a 5–15 line module docstring (racelink/services/*.py)
that names its public API, dependencies, and threading expectations.
This table is the at-a-glance index; open the file for the contract.
| Module | Owns | Called from |
|---|---|---|
gateway_service |
Pending-request registry, TX/RX listener wiring, reconnect lifecycle, auto-restore worker pool, high-level dispatch (send_config / send_sync / send_stream / send_and_wait_for_reply) |
Everything that talks to the gateway |
control_service |
OPC_PRESET / OPC_CONTROL builders, return-value contract (bool for every send_*), per-group cache update |
Web routes, scene runner, RotorHazard adapter |
config_service |
Post-ACK application of OPC_CONFIG changes (state mutation after the gateway confirms) | RX-thread ACK handler via controller._apply_config_update |
sync_service |
Thin wrapper for OPC_SYNC broadcast | Scene runner |
discovery_service |
OPC_DEVICES broadcast + reply collection | Web /api/discover, task manager |
status_service |
OPC_STATUS poll + reply reconciliation | Web /api/status, task manager |
stream_service |
OPC_STREAM payload submission | Startblock service |
startblock_service |
Startblock-program payload assembly + dispatch | Web /api/specials/*, scene runner |
specials_service |
Per-capability device option metadata | Web editor schema, options dialog |
presets_service |
WLED presets.json file store + minimal parser |
OTA workflow, web /api/presets/* |
rl_presets_service |
RaceLink-native preset store (CRUD, persistence) | Web, scene runner, scene cost estimator |
ota_service |
File staging + WLED HTTP transfer (low-level) | OTA workflow |
ota_workflow_service |
Multi-step firmware-update / presets-download orchestration | Web /api/fw/start, /api/presets/download |
host_wifi_service |
NetworkManager nmcli wrapper for OTA |
OTA workflow |
pending_requests |
PendingRequestRegistry for unicast match-and-set on the RX path |
gateway_service |
scenes_service |
Scene store (CRUD + canonical validator + legacy migration shim) | Web, scene runner |
scene_runner_service |
Sequential dispatcher for scenes, including offset_group container expansion | Web /api/scenes/<key>/run, RotorHazard quickset |
scene_cost_estimator |
Predicted wire cost (packets, bytes, airtime) for a scene before it runs | Web /api/scenes/<key>/estimate, web editor |
offset_dispatch_optimizer |
Wire-path planner for offset_group actions (formula vs explicit vs broadcast+overrides) |
Scene runner, cost estimator |
controller.py is the historical "RaceLink_Host" class that all
services attach to. New work generally adds a service module rather
than extending the controller; the controller's role has shrunk to
"composition root + a handful of lifecycle methods + two
_pending_* state slots that bridge the TX↔RX threads".
Threading Model¶
The host is multithreaded by necessity: the serial RX reader can't block on web requests, the scene runner has to fire from its own thread so a Run doesn't block the SSE stream, and OTA workflows run in task-manager threads so the WebUI stays responsive during a multi-minute firmware roll-out.
Threads in a running host¶
Thread name (rl-… prefix) |
Owner | Lifetime |
|---|---|---|
| Main / web request threads | Flask / WSGI server | Per-request |
rl-serial-rx-<port> |
GatewaySerialTransport._reader |
Lives for the duration of one transport session; replaced on reconnect |
rl-task-<name> |
TaskManager |
Per task (discover / status / fwupdate / presets_download) |
rl-reconnect |
GatewayService.schedule_reconnect |
Per reconnect attempt |
rl-gateway-retry |
controller._gateway_retry_timer |
Per scheduled auto-retry; Timer subclass |
rl-auto-restore-N |
GatewayService._auto_restore_executor (ThreadPoolExecutor, max_workers=8) |
Pool, threads reused; idle pool holds 0 active |
| Scene runner (anonymous) | SceneRunnerService.run |
Per scene run |
| SSE generator threads | SSEBridge.gen() |
One per connected SSE client |
| gevent monkey-patched workers (when running under gunicorn-gevent) | gevent | One per request hub |
Locks¶
| Lock | Module | Protects |
|---|---|---|
state_repository.lock (RLock) |
racelink/state/repository.py |
Device + group repository mutations and iterations. Reentrant so a save-path that walks the device list can be called from another locked path without deadlocking. Critical rule: never held across RF I/O — see "Locking Rule" section below. |
_tx_lock (Lock) + _tx_done_cv (Condition) |
transport/gateway_serial.py |
A1+A2 fix. Serializes USB writes so concurrent senders cannot interleave bytes mid-frame; the Condition's predicate (_tx_in_flight) is the lost-wakeup-safe replacement for the previous Event.wait + Event.clear pair. |
_pending_config_lock (Lock) |
controller.py |
A3 fix. Pairs stash_pending_config (TX path) with take_pending_config (RX path) atomically. Distinct from state_repository.lock so a slow ConfigService callback can't delay TX-side stashes. |
_pending_expect_lock (Lock) |
controller.py |
A5 fix. Pairs set_pending_expect (TX path) with read_pending_expect + clear_pending_expect_if (RX path) — the _if variant is compare-and-clear semantics so a stale RX matcher cannot wipe a freshly-stamped TX expectation. |
_clients_lock (gevent Semaphore by default, threading.Lock fallback) |
web/sse.py |
A4 fix. Snapshot-then-fan-out for broadcast so a slow client queue doesn't starve other broadcasters or new SSE registrations. |
_auto_reassign_lock (Lock) |
services/gateway_service.py |
The auto-reassign-recently-seen cache + the in-flight futures list for the auto-restore executor. |
gevent.lock.Semaphore is used by web/sse.py only when the host
runs under gevent (gunicorn -k gevent). Standalone Flask falls back
to threading.Lock automatically. The fallback chain is in
racelink/web/sse.py.
Atomicity guarantees¶
| Operation | Atomic with respect to | Notes |
|---|---|---|
_send_m2n USB write |
Concurrent senders | Per A1 — full body of _send_m2n runs under _tx_lock. Listener fan-out (_emit_tx) happens outside the lock so a slow TX listener cannot stall subsequent senders. |
| Gateway TX-DONE acknowledgement | The matching _tx_in_flight = True flip |
Per A2 — the RX reader's _tx_lock acquisition guarantees it observes the flag set by the TX thread that wrote the matching frame. |
_pending_config stash + pop |
Cross-thread mutations of the dict | Per A3. |
_pending_expect set + clear |
Cross-thread restamps + matches | Per A5 — compare-and-clear via clear_pending_expect_if. |
| Device-repo iteration in the cache update + send_stream paths | Concurrent appends / removals | Per A6 — with state_repository.lock: wraps the iteration; the inner work (or the snapshot built inside) is what runs lock-free. |
Shutdown¶
RaceLink_Host.shutdown() is the canonical teardown path. In order:
- Cancel the gateway-retry timer (
_cancel_gateway_retry). - Close the transport (
transport.close()→ joins the RX thread, closes the serial port). - Cancel the task manager.
- Persist final state (
save_to_db(scopes={NONE})). - Shutdown the auto-restore executor (
gateway_service.shutdown()→executor.shutdown(wait=False, cancel_futures=True)).
Daemon threads not explicitly shut down (SSE generators, etc.) are torn down by Python's exit. Steps 1–5 cover the threads that hold file descriptors or in-flight RF state.
Audit trail¶
The detailed reasoning + regression tests behind every threading
fix are recorded in the project-wide audit plan in the source
repository (under .claude/plans/ — internal to the source repo,
not part of this consolidation) under the "Threading-fix outcome"
sections. New threading
contributions should land regression tests in
tests/test_state_concurrency.py or
tests/test_transport_tx_barrier.py matching that pattern.
Documentation map¶
User-facing and contributor-facing docs in docs/:
| File | Audience | Content |
|---|---|---|
| docs/OPERATOR_GUIDE.md | Operators setting up a race | Glossary, end-to-end workflow, safety rules |
| docs/DEVELOPER_GUIDE.md | Contributors adding a feature | Checklists for action kinds, opcodes, services |
| docs/UI_CONVENTIONS.md | Contributors writing WebUI | Button vocabulary, toast / confirm conventions |
| docs/PROTOCOL.md | Anyone reading wire traces | Wire format reference (M2N/N2M, opcodes, body layouts) |
| docs/standalone.md | Standalone-host operators | Install + run instructions |
| §"Repo Split History" (below) | Contributors crossing repos | Where Host / Gateway / WLED / RH-plugin code lives |
Current Notes¶
controller.pyremains a compatibility-oriented host controller, but it now only coordinates host runtime behavior.- Standalone support continues to use the shared WebUI and host services.
pages/andstatic/are intentionally retained here and are not plugin leftovers.
Gateway Ownership (Plan P3-5)¶
Only one process must hold the USB-serial connection to the RaceLink_Gateway dongle at a time. The host enforces this by opening the port with exclusive=True in racelink/transport/gateway_serial.py.
Ownership rules:
- Standalone mode (
racelink-standalone): the host owns the gateway for the lifetime of the Flask app.run_standalone()callsonStartup({})which triggersdiscoverPort({}). - RotorHazard plugin mode: the plugin owns the gateway. RotorHazard itself does not open the dongle. When the plugin's
initialize()runs, the Host'sonStartupis wired toEvt.STARTUP;discoverPortthen claims the port. - Never run both simultaneously against the same dongle. The second process will see
serial.SerialExceptionfrom the exclusive lock and log it via_record_gateway_error; the UI banner (plan P1-1) surfaces this to the operator. - Release on shutdown:
RaceLink_Host.shutdown()(plan P1-2) callstransport.close()so the port is released before the process exits. The plugin registers this onEvt.SHUTDOWNwhere available.
If you ever need to share a gateway between processes (e.g. dev tooling + live host), serialize access at the process level -- there is no in-transport multiplexing today.
Transport Interface (post-redesign)¶
The Gateway firmware keeps the SX1262 in Continuous RX as its default state. After each TX the Core reverts to Continuous automatically; no Timed-RX window is opened for unicast request/response flows. This was the original cause of the "No ACK_OK for ..." timeout-despite-ACK bug: the Host used to block until the firmware's EV_RX_WINDOW_CLOSED event arrived, but that event can be delayed by ESP32 USB CDC buffering.
Host-side matching is therefore owned entirely by racelink/services/pending_requests.py and the two entry points in GatewayService:
| Call pattern | Helper | Completion signal |
|---|---|---|
| Unicast request → single ACK or specific reply | send_and_wait_for_reply |
PendingRequestRegistry matches (sender, ack_of_or_opc) and sets the per-request event |
| Broadcast / group → N replies within a window | send_and_collect |
Host wall clock (duration_s) with early-exit on expected count |
The old wait_rx_window helper remains for backwards compatibility but is deprecated. New code should not call it.
EV_RX_WINDOW_OPEN / EV_RX_WINDOW_CLOSED stay in the wire format (the Core header is frozen) but are debug-only from the Host's perspective.
Locking Rule: Never hold state_repository.lock across RF I/O¶
The state-repository lock (state_repository.lock, surfaced as ctx.rl_lock in the web layer) is taken by:
- Web handlers that read/mutate device or group state.
- The gateway reader thread, inside
GatewayService.handle_ack_event,on_transport_event(status/identify branches), andpending_*bookkeeping.
Both paths must acquire the same lock so a request thread and the reader thread see a consistent view of the device list. That is the whole point of a single state lock (plan P1-4).
Consequence: a handler that holds the state lock while waiting for a reply over RF will deadlock the reader. The reader thread stalls in handle_ack_event for the reply that just arrived -- and because it is stalled, it cannot pull the next USB frame out of pyserial's RX buffer. USB frames for subsequent devices queue up; the next send_and_wait_for_reply times out even though the ACK is sitting unread in the OS buffer. Symptoms:
- First unicast call in a bulk returns promptly.
- Every subsequent unicast call in the same bulk times out at exactly the wait budget (e.g. 8.000 s).
- Immediately after the timeout releases the lock, a flood of queued USB events drains into the log (TX_DONE, RX window OPEN, late ACK).
The rule, therefore, is:
Never call
setNodeGroupId,sendConfig(..., wait_for_ack=True),sendRaceLink,sendGroupPreset,send_stream,discover_devices, orget_statuswhile holdingstate_repository.lock/ctx.rl_lock.
In practice this means bulk loops must release and re-acquire the lock around each iteration's RF call. See _apply_device_meta_updates in racelink/web/api.py for the reference pattern (acquire → read/mutate in-memory → release → blocking RF → repeat).
A regression test (tests/test_web_handler_helpers.py::ApplyDeviceMetaUpdatesDoesNotHoldLockAcrossBlockingIO) exercises this rule by simulating a second thread that must acquire the lock mid-bulk.
UI Scope Matrix¶
State mutations travel to the UI layer via two paths: the in-process RotorHazard UI (through on_persistence_changed → RotorHazardUIAdapter.apply_scoped_update) and the browser WebUI (through the SSE refresh channel mapped by racelink/domain/state_scope.sse_what_from_scopes). Both consume the same scope tokens so that a single save_to_db(scopes=...) call fans out consistently.
Authoritative scope tokens are defined in racelink/domain/state_scope.py:
| Token | When to use |
|---|---|
FULL |
Initial load (load_from_db) or migration boot -- rebuild everything. |
NONE |
Pure persistence, no visible change (e.g. "Save Configuration" button just flushes the combined key). |
DEVICES |
Device record changed that does not move it between groups (rename, specials struct rebuild). |
DEVICE_MEMBERSHIP |
Device moved to a different group -- affects group counts and any list embedded per group. |
DEVICE_SPECIALS |
A special config byte was written on a single device (startblock slot, etc.). No cross-UI effect on the RH panels. |
GROUPS |
Groups added / renamed / removed -- group-list-backed dropdowns must refresh. |
PRESETS |
WLED presets file or RL preset store reloaded -- preset-list-backed selects must refresh. |
RotorHazard adapter (custom_plugins/racelink_rh_plugin/plugin/ui.py) reacts as follows. Elements in the "Once" column are bootstrapped on first sync and then guarded by the _settings_panel_bootstrapped / _quickset_panel_bootstrapped flags; calling sync_rotorhazard_ui repeatedly therefore no longer produces RHUI Redefining ... log spam.
| RH UI element | Once (bootstrap) | GROUPS | DEVICES | DEVICE_MEMBERSHIP | DEVICE_SPECIALS | PRESETS |
|---|---|---|---|---|---|---|
Panel rl_settings |
✓ | |||||
Panel rl_quickset |
✓ | |||||
Option rl_device_config |
✓ | |||||
Option rl_groups_config |
✓ | |||||
Option rl_assignToNewGroup |
✓ | |||||
Quickbutton rl_btn_set_defaults |
✓ | |||||
Quickbutton rl_btn_force_groups |
✓ | |||||
Quickbutton rl_btn_get_devices |
✓ | |||||
Quickbutton rl_run_autodetect |
✓ | |||||
Option rl_quickset_brightness |
✓ | |||||
Quickbutton run_quickset |
✓ | |||||
Option rl_assignToGroup (dynamic) |
✓ | |||||
Option rl_quickset_group (dynamic) |
✓ | |||||
Option rl_quickset_preset (dynamic) |
✓ | |||||
Default ActionEffect gcaction |
✓ | ✓ | ||||
Per-capability special ActionEffects |
✓ | ✓ | ✓ | ✓ |
SSE topics (racelink/domain/state_scope.sse_what_from_scopes) drive the browser WebUI:
| Token | SSE refresh.what payload |
JS handler action |
|---|---|---|
FULL |
["groups", "devices"] |
loadGroups() + loadDevices() |
NONE |
[] |
no-op |
DEVICES |
["devices"] |
loadDevices() |
DEVICE_MEMBERSHIP |
["devices", "groups"] |
both (membership affects per-group counts) |
DEVICE_SPECIALS |
["devices"] |
loadDevices() |
GROUPS |
["groups"] |
loadGroups() |
PRESETS |
["presets"] |
preset dropdown refresh |
Rule of thumb for new call sites. When you call save_to_db(args, scopes=...), pick the narrowest token set describing what actually changed. If you genuinely don't know, pass {FULL} -- but prefer to refactor so you do know. The RH adapter and SSE scope map are both designed around this precision, and the regression tests in tests/test_ui_scope_routing.py (plugin) and tests/test_state_scope.py (host) pin the mapping so an accidental FULL-regression surfaces in CI.
Repo Split History¶
This section folds in the content of the former docs/repo_split_map.md
(retained in the source repository) for completeness.
Host-Owned Import Edge¶
These entry points stay in RaceLink_Host and are the supported
surface for external adapters:
racelink.app:create_runtimeracelink.web:register_racelink_webracelink.web:RaceLinkWebRuntime
Already moved out of Host¶
The following paths used to live in this repository and now belong in
the separate RaceLink_RH-plugin repository:
| Previous Host path | Target in plugin repo | Note |
|---|---|---|
__init__.py |
plugin repo root __init__.py |
RotorHazard loader shim now belongs with the plugin |
racelink/integrations/rotorhazard/__init__.py |
racelink_rh_plugin/integrations/rotorhazard/__init__.py |
Plugin package edge |
racelink/integrations/rotorhazard/plugin.py |
racelink_rh_plugin/integrations/rotorhazard/plugin.py |
Adapter bootstrap for RH |
racelink/integrations/rotorhazard/ui.py |
racelink_rh_plugin/integrations/rotorhazard/ui.py |
RotorHazard UI adapter |
racelink/integrations/rotorhazard/actions.py |
racelink_rh_plugin/integrations/rotorhazard/actions.py |
RH action registration |
racelink/integrations/rotorhazard/dataio.py |
racelink_rh_plugin/integrations/rotorhazard/dataio.py |
RH import/export adapter |
racelink/integrations/rotorhazard/source.py |
racelink_rh_plugin/integrations/rotorhazard/source.py |
RH event source adapter |
Files that stay in Host¶
| Host path | Why it stays |
|---|---|
racelink/app.py |
Owns the host runtime factory and service wiring |
racelink/web/** |
Owns the shared RaceLink WebUI registration, API, SSE, task state |
racelink/integrations/standalone/** |
Hosts the standalone Flask mode |
racelink/pages/** and racelink/static/** |
Shared RaceLink WebUI assets for all hosting modes |
controller.py |
Host controller and runtime coordinator |