RaceLink Architecture¶
Repository Scope¶
RaceLink_Host now contains only the host-owned parts of the system:
- core runtime wiring
- transport, protocol, state, and service layers
- the shared RaceLink WebUI
- standalone Flask hosting
RotorHazard-specific adapter code is no longer part of this repository. That adapter belongs in the separate RaceLink_RH-plugin repository.
Stable Host Entry Points¶
External adapters should depend on the host through these stable entry points:
racelink.app.create_runtime(...)racelink.web.register_racelink_web(...)
This keeps plugin repositories from reaching deeply into host internals.
Package Layout¶
racelink/app.pyRuntime container (RaceLinkApp) and host-owned runtime factory (create_runtime). Wires every*_serviceinstance onto the controller and into theRaceLinkApp.servicesregistry.racelink/core/Cross-cutting contracts (HostApiProtocol,NullSource,NullSink).racelink/domain/Framework-agnostic types and catalogues. Sub-modules:models(RL_Device,RL_DeviceGroup),device_types(RL_Dev_Type,RL_DEV_TYPE_CAPS),flags(the canonical six-flag bit layout used by bothOPC_PRESETandOPC_CONTROL),specials(RL_PRESET_EDITOR_SCHEMA, capability-specific Specials options),capabilities(per-capability special-keys helper),state_scope(notification scope tokens — see UI Scope Matrix),node_config(operator-facingOPC_CONFIGcommand catalogue +configBytebit catalogue, served via/api/node-config/schema),offset_formula(pure Python evaluator for offset_group base/step; pinned to the TS port via the shared parity fixture),wled_effects,wled_palettes,wled_palette_color_rules,wled_deterministic(WLED metadata catalogues used by the RL-effect editor; large tables generated bygen_wled_metadata.py).racelink/protocol/Wire-format constants and helpers:addressing,rules,codec,packets, plus the auto-generatedracelink_proto_auto.py(mirrorsracelink_proto.h; regenerated viagen_racelink_proto_py.py).racelink/transport/Serial gateway transport (gateway_serial), wire framing (framing), event constants (gateway_events).racelink/state/In-memoryStateRepository(devices/groups/backup_devices/backup_groups+ a re-entrant mutation lock), persistence (persistence.py— JSON dumps, legacy-shape parser), schema-version migrations (migrations.py).racelink/services/Host business workflows — see "Service Layer" below.racelink/web/blueprint.py(Flask Blueprint factory +register_racelink_web),api.py(50+ REST routes),sse.py(SSE bridge +MasterStatemirror),dto.py(device/group serialization shared with the WebUI),request_helpers.py(DTO parsers + small validation helpers),tasks.py(long-runningTaskManagerfor discover / status / OTA / presets-download, with cooperativerequest_cancel()/is_cancel_requested()polled by the worker threads — see developer guide §"Making the workflow cancellable").racelink/integrations/standalone/Canonical standalone Flask bootstrap (bootstrap.py,webapp.py,config.py); used both as a library entry point and via theracelink-standaloneconsole script.racelink/integrations/polling/HTTP source / sink helpers that some integrations may consume for non-RotorHazard data flows.racelink/tools/Setup utilities shipped with the package (e.g.setup_nmcli_polkit.pyfor Linux WiFi/OTA permissions).racelink/pages/andracelink/static/WebUI hosting surface. With the Vue migration:racelink/pages/is empty — the legacy Jinja templates were retired when the SPA shell took over.racelink/static/dist/ships the Vite-built SPA bundle (index.html+assets/); the bundle is committed so a plugin install does not require a Node toolchain.racelink/static/may otherwise contain non-SPA static files (e.g. favicons); the legacyracelink.css,racelink.js,scenes.js, andvendor/Sortable.min.jswere retired together with the templates.frontend/Vue 3 + Vite + Pinia SPA source — see "Frontend (Vue SPA)" below. Built artefacts land inracelink/static/dist/.
WebUI Hosting Model¶
There is one RaceLink WebUI. Since the Vue migration (PoC merge 2026-04-29) it ships as a Vite-built single-page app:
- In standalone mode, the standalone Flask app mounts the blueprint
through
register_racelink_web(...). - In RotorHazard mode, the external adapter plugin is expected to mount that same blueprint through the same host-owned registration entry.
- The packaged standalone user entrypoint is
racelink-standalone, which boots the host-owned standalone integration underracelink.integrations.standalone.
The blueprint serves the SPA shell at both /racelink/ (Devices
landing) and /racelink/scenes (Scenes editor lazy chunk) by
rendering racelink/static/dist/index.html through
render_template_string so the {{ rl_base_path }} /
{{ rl_static_path }} placeholders are substituted server-side. Vue
Router takes over once the bundle has booted; navigations between
the two pages do not reload the SPA shell. The committed
racelink/static/dist/ bundle means a plugin install does not need
Node — see frontend/README.md for the developer workflow.
Layer Boundaries¶
domainstays framework-agnostic.protocolandtransportdo not depend on web-hosting concerns.stateowns repositories and persistence.servicesimplement host workflows and should not depend on external adapters.webadapts HTTP and SSE traffic to host services.integrations/standalonedepends inward on host modules and does not define separate UI behavior.
Service Layer¶
The host's business logic is split across the modules below. Each
service has a 5–15 line module docstring (racelink/services/*.py)
that names its public API, dependencies, and threading expectations.
This table is the at-a-glance index; open the file for the contract.
| Module | Owns | Called from |
|---|---|---|
gateway_service |
Pending-matcher registry, TX/RX listener wiring, reconnect lifecycle, auto-restore worker pool, high-level dispatch (send_config / send_sync / send_stream / send_and_match / send_and_wait_with_retries) |
Everything that talks to the gateway |
control_service |
OPC_PRESET / OPC_CONTROL builders, return-value contract (bool for every send_*), per-group cache update |
Web routes, scene runner, RotorHazard adapter |
config_service |
Post-ACK application of OPC_CONFIG changes (state mutation after the gateway confirms) | RX-thread ACK handler via controller._apply_config_update |
sync_service |
Thin wrapper for OPC_SYNC broadcast | Scene runner |
discovery_service |
OPC_DEVICES broadcast + reply collection | Web /api/discover, task manager |
status_service |
OPC_STATUS poll + reply reconciliation | Web /api/status, task manager |
stream_service |
OPC_STREAM payload submission | Startblock service |
startblock_service |
Startblock-program payload assembly + dispatch | Web /api/specials/*, scene runner |
specials_service |
Per-capability device option metadata | Web editor schema, options dialog |
presets_service |
WLED presets.json file store + minimal parser |
OTA workflow, web /api/presets/* |
rl_presets_service |
RaceLink-native preset store (CRUD, persistence) | Web, scene runner, scene cost estimator |
ota_service |
File staging + WLED HTTP transfer (low-level) | OTA workflow |
ota_workflow_service |
Multi-step firmware-update / presets-download orchestration | Web /api/fw/start, /api/presets/download |
host_wifi_service |
NetworkManager nmcli wrapper for OTA |
OTA workflow |
pending_requests |
PendingMatcher + PendingMatcherRegistry — unified RX-event matcher covering single-sender unicast (1-reply), multi-sender N-reply collectors, and wildcard discovery via one data structure and one wait primitive |
gateway_service |
scenes_service |
Scene store (CRUD + canonical validator + legacy migration shim) | Web, scene runner |
scene_runner_service |
Sequential dispatcher for scenes — every per-kind handler funnels through dispatch_planner.plan_action_dispatch and a small _dispatch_op adapter that maps symbolic sender names to control_service / sync_service / controller methods |
Web /api/scenes/<key>/run, RotorHazard quickset |
scene_cost_estimator |
Predicted wire cost (packets, bytes, airtime) for a scene before it runs — iterates the same plan the runner consumes and sums body_bytes |
Web /api/scenes/<key>/estimate, web editor |
dispatch_planner |
Single source of truth for "what packets would the runner emit" — pure / side-effect-free function plan_action_dispatch(action, …) → ActionDispatchPlan{ops: List[WireOp], …}. Both the runner (which dispatches each op) and the cost estimator (which sums byte counts) consume the same plan. Parity is enforced by tests/test_dispatch_parity.py |
Scene runner, cost estimator |
offset_dispatch_optimizer |
Wire-path strategy picker for the offset_group Phase-1 OPC_OFFSET sequence (formula vs explicit vs broadcast+overrides). Called from the dispatch planner; emits WireOp records with sender="send_offset" |
Dispatch planner |
rf_timing |
Static LoRa airtime / time-on-air helpers consumed by the cost estimator and by per-send pacing decisions in the gateway service | Gateway service, scene cost estimator |
controller.py is the historical RaceLink_Host class that all
services attach to. Its current role is twofold:
- Composition root. The constructor instantiates every
*_serviceand stores them onself.<name>_service. The publiccreate_runtimefactory (racelink/app.py) then mirrors them onto theRaceLinkApp.servicesdict and adds the process-wide singletons (PresetsService,RLPresetsService,SceneService,SceneRunnerService,HostWifiService,OTAService). - Cross-thread bridge + legacy compatibility surface. The
_pending_expect/_pending_configslots (each guarded by a dedicatedthreading.Lock) handle the TX→RX hand-off; a sizable set ofsendXxxshim methods (e.g.sendRaceLink,sendGroupPreset,sendWledPreset,sendRlPreset,sendStartblockControl,sendWledResetOverrides) forward to the corresponding*_service.send_*so legacy plugin callers keep working. New code should call the services directly.
Dispatch Parity Contract¶
The scene runner and the cost estimator both compute "what packets
would this action emit". Pre-dispatch_planner they had two
implementations of that question, which drifted (see the in-module
comment in services/dispatch_planner.py for the historical bug).
The current contract:
services/dispatch_planner.plan_action_dispatch(action, …)is pure / side-effect-free. It returns anActionDispatchPlan(ops: List[WireOp], degraded, error, detail).services/scene_runner_serviceconsumesplan.opsand dispatches each via the small_dispatch_opadapter that mapsWireOp.sender(a symbolic string like"send_control"/"send_wled_preset"/"send_offset"/"send_sync"/"send_startblock") to a concrete method oncontrol_service,sync_service, orcontroller.services/scene_cost_estimator.estimate_sceneconsumes the sameplan.opsand sumsWireOp.body_bytesagainst the LoRa-physics helpers inservices.rf_timing.tests/test_dispatch_parity.pyenumerates representative actions and asserts that the runner-side and estimator-side reads of the same plan match exactly.
The offset_group Phase-1 (OPC_OFFSET broadcast vs explicit vs
formula+overrides) strategy lives in
services/offset_dispatch_optimizer.plan_offset_setup and emits
WireOp records with sender="send_offset". The planner then
appends Phase-2 child ops in the same plan, so the estimator can
read both phases from one source.
Threading Model¶
The host is multithreaded by necessity: the serial RX reader can't block on web requests, the scene runner has to fire from its own thread so a Run doesn't block the SSE stream, and OTA workflows run in task-manager threads so the WebUI stays responsive during a multi-minute firmware roll-out.
Threads in a running host¶
Thread name (rl-… prefix) |
Owner | Lifetime |
|---|---|---|
| Main / web request threads | Flask / WSGI server | Per-request |
rl-serial-rx-<port> |
GatewaySerialTransport._reader |
Lives for the duration of one transport session; replaced on reconnect |
rl-task-<name> |
TaskManager |
Per task (discover / status / fwupdate / presets_download) |
rl-reconnect |
GatewayService.schedule_reconnect |
Per reconnect attempt |
rl-gateway-retry |
controller._gateway_retry_timer |
Per scheduled auto-retry; Timer subclass |
rl-auto-restore-N |
GatewayService._auto_restore_executor (ThreadPoolExecutor, max_workers=8) |
Pool, threads reused; idle pool holds 0 active |
| Scene runner (anonymous) | SceneRunnerService.run |
Per scene run |
| SSE generator threads | SSEBridge.gen() |
One per connected SSE client |
| gevent monkey-patched workers (when running under gunicorn-gevent) | gevent | One per request hub |
Locks¶
| Lock | Module | Protects |
|---|---|---|
state_repository.lock (RLock) |
racelink/state/repository.py |
Device + group repository mutations and iterations. Reentrant so a save-path that walks the device list can be called from another locked path without deadlocking. Critical rule: never held across RF I/O — see "Locking Rule" section below. |
_tx_lock (Lock) + _tx_done_cv (Condition) |
transport/gateway_serial.py |
Concurrency fix. Serializes USB writes so concurrent senders cannot interleave bytes mid-frame; the Condition's predicate (_tx_in_flight) is the lost-wakeup-safe replacement for the previous Event.wait + Event.clear pair. |
_pending_config_lock (Lock) |
controller.py |
Concurrency fix. Pairs stash_pending_config (TX path) with take_pending_config (RX path) atomically. Distinct from state_repository.lock so a slow ConfigService callback can't delay TX-side stashes. |
_pending_expect_lock (Lock) |
controller.py |
Concurrency fix. Pairs set_pending_expect (TX path) with read_pending_expect + clear_pending_expect_if (RX path) — the _if variant is compare-and-clear semantics so a stale RX matcher cannot wipe a freshly-stamped TX expectation. |
_clients_lock (gevent Semaphore by default, threading.Lock fallback) |
web/sse.py |
Concurrency fix. Snapshot-then-fan-out for broadcast so a slow client queue doesn't starve other broadcasters or new SSE registrations. |
_auto_reassign_lock (Lock) |
services/gateway_service.py |
The auto-reassign-recently-seen cache + the in-flight futures list for the auto-restore executor. |
gevent.lock.Semaphore is used by web/sse.py only when the host
runs under gevent (gunicorn -k gevent). Standalone Flask falls back
to threading.Lock automatically. The fallback chain is in
racelink/web/sse.py.
Atomicity guarantees¶
| Operation | Atomic with respect to | Notes |
|---|---|---|
_send_m2n USB write |
Concurrent senders | full body of _send_m2n runs under _tx_lock. Listener fan-out (_emit_tx) happens outside the lock so a slow TX listener cannot stall subsequent senders. |
| Gateway TX-DONE acknowledgement | The matching _tx_in_flight = True flip |
the RX reader's _tx_lock acquisition guarantees it observes the flag set by the TX thread that wrote the matching frame. |
_pending_config stash + pop |
Cross-thread mutations of the dict | |
_pending_expect set + clear |
Cross-thread restamps + matches | compare-and-clear via clear_pending_expect_if. |
| Device-repo iteration in the cache update + send_stream paths | Concurrent appends / removals | with state_repository.lock: wraps the iteration; the inner work (or the snapshot built inside) is what runs lock-free. |
Shutdown¶
RaceLink_Host.shutdown() is the canonical teardown path. In order:
- Cancel the gateway-retry timer (
_cancel_gateway_retry). - Close the transport (
transport.close()→ joins the RX thread, closes the serial port). - Cancel the task manager.
- Persist final state (
save_to_db(scopes={NONE})). - Shutdown the auto-restore executor (
gateway_service.shutdown()→executor.shutdown(wait=False, cancel_futures=True)).
Daemon threads not explicitly shut down (SSE generators, etc.) are torn down by Python's exit. Steps 1–5 cover the threads that hold file descriptors or in-flight RF state.
Audit trail¶
The detailed reasoning and regression tests behind each threading
fix live in the maintainer's internal engineering ledger and are
not part of this public consolidation. New threading contributions
should land regression tests in
tests/test_state_concurrency.py or
tests/test_transport_tx_barrier.py matching the existing pattern.
Documentation map¶
Other user-facing and contributor-facing docs in this repo:
| File | Audience | Content |
|---|---|---|
webui-overview.md |
Operators setting up a race | WebUI orientation; links to the device-setup / firmware / RL-preset / scene task guides |
developer-guide.md |
Contributors adding a feature | Checklists for action kinds, opcodes, services |
ui-conventions.md |
Contributors writing WebUI | Button vocabulary, toast / confirm conventions |
reference/wire-protocol.md |
Anyone reading wire traces | Wire format reference (M2N/N2M, opcodes, body layouts) |
standalone-install.md |
Standalone-host operators | Install + run instructions |
| §"Repo Split History" (below) | Contributors crossing repos | Where Host / Gateway / WLED / RH-plugin code lives |
Current Notes¶
controller.pyremains a compatibility-oriented host controller, but it now only coordinates host runtime behavior.- Standalone support continues to use the shared WebUI and host services.
pages/andstatic/are intentionally retained here and are not plugin leftovers.
Gateway Ownership¶
Only one process must hold the USB-serial connection to the RaceLink_Gateway dongle at a time. The host enforces this by opening the port with exclusive=True in racelink/transport/gateway_serial.py.
Ownership rules:
- Standalone mode (
racelink-standalone): the host owns the gateway for the lifetime of the Flask app.run_standalone()callsonStartup({})which triggersdiscoverPort({}). - RotorHazard plugin mode: the plugin owns the gateway. RotorHazard itself does not open the dongle. When the plugin's
initialize()runs, the Host'sonStartupis wired toEvt.STARTUP;discoverPortthen claims the port. - Never run both simultaneously against the same dongle. The second process will see
serial.SerialExceptionfrom the exclusive lock and log it via_record_gateway_error; the UI banner surfaces this to the operator. - Release on shutdown:
RaceLink_Host.shutdown()callstransport.close()so the port is released before the process exits. The plugin registers this onEvt.SHUTDOWNwhere available.
If you ever need to share a gateway between processes (e.g. dev tooling + live host), serialize access at the process level -- there is no in-transport multiplexing today.
Transport Interface (post-redesign)¶
The Gateway firmware keeps the SX1262 in Continuous RX as its default state. After each TX the Core reverts to Continuous automatically; no Timed-RX window is opened for unicast request/response flows. This was the original cause of the "No ACK_OK for ..." timeout-despite-ACK bug: the Host used to block until the firmware's EV_RX_WINDOW_CLOSED event arrived, but that event can be delayed by ESP32 USB CDC buffering.
Host-side matching is owned entirely by racelink/services/pending_requests.py and the single primitive in GatewayService:
| Call pattern | Helper | Completion signal |
|---|---|---|
| Any RX-reply expectation | send_and_match(send_fn, matcher) |
A PendingMatcher whose sender_filter / expected_opcode / expected_ack_of / discriminator_* fields define what counts as a match. Exits on expected_count reached (reason="count"), idle-after-first-match (reason="idle"), hard ceiling with at least one match ("max_timeout"), or hard ceiling with none ("no_reply"). |
| Unicast with retries | send_and_wait_with_retries(recv3, opcode7, send_fn, ...) |
Thin retry wrapper that rebuilds a unicast PendingMatcher per attempt and short-circuits on the first successful reply. |
See Reply Matching (PendingMatcher) for the full data-flow, filter semantics, and migration history from the earlier split (send_and_wait_for_reply + send_and_collect).
The old wait_rx_window helper remains for backwards compatibility but is deprecated. New code should not call it.
EV_RX_WINDOW_OPEN / EV_RX_WINDOW_CLOSED stay in the wire format (the Core header is frozen) but are debug-only from the Host's perspective.
Multi-Transport runtime¶
The host's multi-network runtime drives several USB-attached gateways from one process, each gateway carrying its own LoRa channel and its own subset of networks/devices. Internally the runtime is built around a transport list; every helper falls back to the singleton behaviour when only one transport is attached, so a single-gateway deployment runs byte-identically to the pre-multi-network host.
See the §"Migration history" appendix at the end of this file for the stage-by-stage rollout history (Stage 2 / Stage 3 / Stage 4 / Stage 5) — useful for archeology, not needed to read the current behaviour.
Transport list + per-network routing helpers¶
controller._transports is the primary store. The legacy
controller.transport is a property that reads/writes
_transports[0]. Three helpers route by network:
controller.transport_for_network(network_id)— resolves viaRL_Network.gateway_mac↔transport.ident_mac. Falls back to the only transport when N=1 and the network has nogateway_macyet (default-bind path for single-gateway deployments).controller.transport_for_device(addr)— looks up the device, reads itsnetwork_id, delegates totransport_for_network. Used by every unicast send so a packet targeted at a device on network-A goes via the gateway bound to network-A.controller.transport_for_group(group_id)— same shape, for group-targeted broadcasts (e.g.OPC_PRESETwithrecv3=FFFFFF).
Each transport tags every event it emits with gateway_id =
self.ident_mac (in _emit / _emit_tx) so downstream matchers
+ pending-expect lookups can scope by source-transport without
the host having to thread the transport reference through every
event.
PendingMatcher gateway_id¶
The PendingMatcherRegistry is a per-gateway dict
({ident_mac → PendingMatcherRegistry}) so two devices on
different gateways that happen to share their last-3 MAC bytes
cannot collide in the fast-bucket lookup. The
PendingMatcher.gateway_id field is required at registration
time for concrete-sender matchers (unicast and multi-sender
N-reply); wildcard matchers (discovery, fleet-wide broadcast
collectors) may still omit it.
# Registry refuses a unicast matcher without gateway_id.
m = PendingMatcher(
sender_filter=frozenset({mac_last3}),
expected_ack_of=OPC_SET_GROUP,
gateway_id=routed_transport.ident_mac, # required
)
registry.register(m)
send_and_match(...) and send_and_wait_with_retries(...)
accept an explicit transport= kwarg and auto-derive
gateway_id from routed_transport.ident_mac when the caller
doesn't pass it. Callers that don't pre-route get
transport_for_device(recv3) resolution from
send_and_wait_with_retries for free.
Per-network MasterStateMap¶
SSEBridge.masters is a MasterStateMap — one MasterState
per network id (default-network slot eager-created so
bridge.master stays available for legacy callers). Every
EV_STATE_CHANGED from a tagged transport routes to the slot
owned by that gateway's network via
network_repository.get_by_gateway_mac(ev["gateway_id"]).
The SSE master event broadcasts the unified
{networks: [...], default_network_id: "..."} shape; the
WebUI's gateway store reads networks[0] as the legacy
single-master for back-compat with the pre-Stage-4 frontend.
enumerate_all boot path¶
controller.discoverPort walks three paths, chosen by
_normalize_comms_pins(rl_comms_port) (which accepts a single
port, a comma-separated list, a native list, or a JSON-array
string):
- Single pin (one port) — legacy single-port
GatewaySerialTransport.discover_and_openflow: open exactly that device, no probe walk, never callsenumerate_all. - Unpinned (empty) —
GatewaySerialTransport.enumerate_all()probes every USB port + returns the list of(port, ident_mac)tuples for every responding RaceLink gateway. The controller then constructs one transport per hit, calls_attach_transportfor each, and the bind service classifies them in sequence. - Multi pin (≥2 ports) — runs
enumerate_all()then filters the result to the pinned ports before the attach loop. Probing still yields each gateway'sident_mac, so the pinned transports bind to their networks the same way the unpinned path does; gateways not in the pin set are ignored.
_attach_transport(transport) is the per-transport
orchestrator: transport.start(), append to _transports,
run _bind_transport_to_network, install hooks per-id, then
call gateway_bind_service.evaluate(transport). The bind
service's evaluate queries the gateway's NVS RF config via
GW_CMD_GET_RF_CONFIG and broadcasts the resulting
gateway_bound / gateway_conflict / gateway_unbound SSE
event.
Bind-state machine¶
racelink/services/gateway_bind_service.py owns the
per-ident_mac BindRecord map. States:
| State | Meaning |
|---|---|
PENDING |
Just attached, about to query RF config |
BOUND |
Gateway's NVS RF matches the bound network's rf_config |
CONFLICT |
Bound but the RF settings disagree; operator wizard chooses accept_gateway / accept_host |
UNBOUND |
No RL_Network carries this ident_mac and auto-bind didn't fire |
Resolve actions (operator → POST /api/gateways/{ident_mac}/resolve):
accept_gateway (adopt actual config), accept_host (schedule
RF migration via rf_migration_service), create_network,
rebind. Token-gated so a stale wizard answer can't override
a re-evaluated record.
RF migration engine¶
racelink/services/rf_migration_service.py::migrate_network_to
is the four-phase pipeline:
- Pre-check — partition the network's devices into
push(need migration),skipped_already_target(last_known_rf_config matches),skipped_offline. - Phase 1 — Device push. Per device,
set_node_rf_config(target, transport=...)via the old gateway. Each device reboots onto the new config. - Phase 2 — Gateway switch. Single
set_gateway_rf_config(target, persist=True, transport=...). Gateway reboots; controller reconnect re-opens the USB device. - Phase 3 — Verification. Discovery on the new channel.
Survivors get
last_known_rf_configupdated; the rest land instranded(Channel-Scan recovery).
On success the engine calls
bind_service.re_evaluate(ident_mac) so the SSE
gateway_conflict flips to gateway_bound automatically.
Channel-Scan service¶
racelink/services/channel_scan_service.py::scan_region
walks a region's channel table on one gateway. Per channel:
volatile-switch (no NVS), 800 ms settle, broadcast
OPC_DEVICES, dwell, partition responders into known
(repo hit; updates last_known_rf_config) vs unknown
(recorded with channel info). A try/finally restores the
gateway's pre-scan RF config on exit so a mid-scan exception
doesn't leave the gateway on the wrong channel.
Cross-network fan-out (BroadcastTarget)¶
Broadcast routing is governed by an explicit
BroadcastTarget
— a frozen tuple of network_ids the caller wants the broadcast
to reach. There is no implicit "current network" or "UI focus"
fallback: the call site either constructs a target explicitly or
the helper falls back to a deprecated all-attached default with a
warning logged for migration.
Why explicit: in a multi-network deployment a scene's
target network set is independent of UI focus (the operator may
view network A while a scene runs on B). The pre-BroadcastTarget
implicit "all attached" sync fan-out caused two real problems:
- Performance —
GatewayService.send_synclooped sequentially over every attached transport, blocking on each transport'sEV_TX_DONE. With N gateways the round-trip was ~N × 50 ms; in two-network setups operators saw sync round-trip jump from ~50 ms to ~104 ms. - Scope leakage — a scene on network A still tickled
network B's
OPC_SYNC, which could fire anyarm_on_synceffects pre-loaded on B's devices. Operators saw unrelated triggers fire on the "wrong" network.
The refactor splits scope (what networks to reach) from dispatch (how to send in parallel):
BroadcastTarget— the explicit scope object. Factories:BroadcastTarget.from_ids(iterable)for caller-supplied sets,BroadcastTarget.single(network_id)for one-network sends,BroadcastTarget.all_attached(controller)for fleet-wide health probes that genuinely want every gateway.broadcast_fanout(transports, work_fn, ...)— thread-pool helper. Spawns one daemon thread per transport, all kicked off in quick succession (Phase A: dispatch — each USB write lands within ~1 ms of the previous), then joins (Phase B: collect — total wall-clock bounded by the slowest transport's airtime, NOT N × airtime). Worker exceptions are captured per-transport and don't abort siblings.resolve_broadcast_transports(controller, target, ...)— maps aBroadcastTargetto live transports viatransport_for_network; networks without an attached transport are skipped (with a warning) so a temporarily- disconnected gateway doesn't crash the broadcast.
The threaded fan-out avoids splitting _send_m2n into separate
dispatch/await primitives — the existing _tx_outcome_cv /
_pending_send_outcome discipline that guards the RX-reader
thread is preserved. Two workers on different transports each
hold their own _tx_lock and never serialize on each other.
Sub-second LoRa airtime is the dominant wall-clock cost; the
~10 µs Thread.start() overhead per worker is negligible.
Service-layer surface:
| Method | Default scope (target=None) |
Explicit target=BroadcastTarget(...) |
|---|---|---|
GatewayService.send_sync (broadcast) |
All attached transports + deprecation warning | Threaded fan-out across listed networks |
GatewayService.send_sync (unicast) |
Routed via transport_for_device(recv3) — target ignored |
(same) |
ControlService.send_group_preset |
Routed via transport_for_group(group_id) — single transport |
Threaded fan-out across listed networks |
ControlService.send_offset(targetGroup=...) |
Same group-routed single transport | Same threaded fan-out |
ControlService.send_control(targetGroup=...) |
Same group-routed single transport | Same threaded fan-out |
| Device-targeted sends | transport_for_device(addr) — target ignored |
(same) |
Scene-driven scope computation:
SceneRunnerService.run() calls
scene_network_ids(scene, controller)
at the start of every scene. The helper walks every action's
canonical target field and aggregates the set of networks the
scene touches:
target.kind == "broadcast"→ every persisted network (group 255 is fleet-wide by design).target.kind == "groups"→ resolve each gid →group.network_id.target.kind == "device"→ resolve mac →device.network_id.- Actions without a target (
sync,delay) contribute nothing — they inherit the union from the rest of the scene. offset_groupcontainers descend into children.
The aggregated set is held on the runner for the duration of the
run (_current_sync_target) and passed to every sync-action
dispatch. This means:
- A scene that targets only group A on network 1 fires its
synconly on network 1's gateway — network 2 (uninvolved) sees no traffic. - A scene with
target.kind == "broadcast"(group 255) targets every network, exactly as before. - A sync-only scene (no resolvable scope) falls back to all-attached with the deprecation warning — the conservative default that preserves pre-refactor behaviour.
Operator-pinned override: scene.network_scope
The scene-derived scope still has a gap: a scene with ONLY
broadcast-target actions auto-resolves to "every persisted
network" — the operator has no way to constrain that to a subset
without picking specific groups. The network_scope field on
each scene closes the gap:
"network_scope": {"mode": "auto"} // default
"network_scope": {"mode": "explicit", "network_ids": ["net-a", "net-b"]}
- Auto →
scene_network_idswalks actions as described above. - Explicit →
scene_network_idsreturns the persisted list filtered against the network repository (stale ids silently dropped at runtime with an INFO log). The operator sets this via the Scope chip in the scene editor header — a dialog with an Auto/Explicit radio + multi-select checkbox.
When explicit, two extra invariants kick in:
- Per-action target filter cascade — the editor's
SceneTargetPickerandMultiGroupPickerDialogreceive the scope as a prop and restrict their group/device dropdowns to in-scope networks. Group id 0 (Unconfigured) always passes regardless — same exception the boundary validator makes. - Save-time cross-validator —
validate_scene_scope_consistencyruns at the web layer afterscenes_servicecanonicalizes the payload. Two failure shapes (bothSceneScopeViolation, HTTP 400): unknown_network_id— scope references a network that isn't in the network repository.scope_violation— an action's resolved target lies on a network not in the scope. Detail carriesoffending_action_indexso the editor scrolls + highlights the offending row.
The check lives at the web layer (not in scenes_service)
because it needs the device + group repositories;
scenes_service deliberately stays repository-free.
Runtime degradation when scope resolves empty
A persisted explicit scope can resolve to () at runtime — every
listed network was deleted from the repository. The runner
guards against silently widening back to "all attached" via a
dedicated _broadcast_is_explicit_empty flag:
- Auto mode + empty scope → SYNC falls back to all-attached (deprecated path, still active for pre-feature back-compat).
- Explicit mode + empty scope → broadcasts are SKIPPED at the
dispatch site. The action is recorded as a degraded run with
error="scope_resolved_empty". This is the operator-resolved choice: silent widening would defeat the whole point of pinning the scope.
Cost estimator integration
/api/scenes/<key>/estimate (and the draft variant) now return
two extra fields:
resolved_network_ids— the same listscene_network_idsreturns at runtime; powers the editor's "Fan-out: N gateways" pill.network_scope_mode—"auto" | "explicit", mirrors the scene field; lets the editor pick the right chip variant.
The per-action packets / bytes / airtime_ms figures
deliberately do NOT multiply by fan-out width. The dominant
operator cost is LoRa airtime, and broadcast workers run in
parallel — wall-clock airtime is bounded by the slowest single
radio, not summed. The fan-out pill is the operator-visible
indicator that N gateways are involved.
What's intentionally NOT in this iteration:
- No per-action override — a scene cannot today say "this particular sync fires only on network B even though the scene touches A+B+C". The scene-wide scope (auto-resolved or explicit) governs every broadcast in the scene.
- No operator-driven manual SYNC button — the UI doesn't currently expose a "fire SYNC now on networks [X, Y]" control. By design: broadcasts are scene-driven, not a direct operator action.
- Back-compat fallback still active — auto mode with empty
scope falls back to all-attached with a warning. After
every caller has migrated the fallback should upgrade to
ValueErrorso callers can't accidentally ship a broadcast without explicit scope.
Per-group network migration¶
migrate_network_to pushes a whole network onto a new RF config;
this complementary API moves one or more groups (with all their
members) from one existing network onto another existing network. Network membership is a per-group
property (one network per group), so the operator-facing API and
the WebUI both operate exclusively at group granularity — per-device
migration is an internal helper, not a public surface. Key
differences from migrate_network_to:
- No gateway-RF switch. Both source + target gateways stay on their
persisted RF configs; only the devices' physical RF settings
(and their
network_idmetadata) flip. - Source transport is the device's CURRENT network's gateway —
set_node_rf_configreaches the device BEFORE it reboots onto the target config. The defaulttransport_for_device(mac)resolution inGatewayService.set_node_rf_confighandles this automatically. - Per-device metadata flip (
network_id+last_known_rf_config) happens INSIDE the state-repository lock — same discipline_apply_device_meta_updatesuses for group changes.
Service surface:
| Method | Visibility | Notes |
|---|---|---|
RfMigrationService.migrate_groups_to(target_network_id, group_ids, offline_mode) |
Public; route handler entry | Validates + deduplicates group_ids, fails fast on any unknown id, then runs one combined migration over the unioned member set. Flips group.network_id on every resolved group regardless of partial member failure. |
RfMigrationService.migrate_devices_to(target_network_id, macs, offline_mode) |
Internal helper | Called by migrate_groups_to with the unioned, deduplicated member mac list. Not exposed as a route — devices always travel with their group. |
API:
| Route | Body | Behaviour |
|---|---|---|
POST /api/groups/migrate-network |
{group_ids: [int], target_network_id, offline_mode} |
Single TaskManager job for one OR many groups (single-group migration = a one-element list). Rejects HTTP 400 with detail.code = "offline_block" when block mode finds offline members across the union of all requested groups. Unknown group_ids fail fast before any mutation; empty group_ids list is HTTP 400; empty membership for a known group is allowed (the flip still happens so the operator can pre-stage). |
Offline modes (parallels the group-move pattern in
_apply_device_meta_updates):
block(default) — pre-check rejects with HTTP 400 +detail.offline_macsif any device is offline. The WebUI dialog then offers the two fallbacks below.skip— metadata-only flip for offline devices; no wire push. Channel-Scan workflow recovers them later. Online devices get the wire push as normal.force— attempts the wire push even for offline devices; metadata flips regardless. Failed pushes land inresult["stranded"]— same recovery path asskip.
Why metadata flips for offline devices (operator question
during design): mirroring the existing group-move auto-restore
pattern (_restore_known_device_group). The host's view of
"which network this device belongs to" reflects operator INTENT;
the wire reconciliation catches up later. Stranded devices surface
in Channel Scan with their now-known-good last_known_rf_config,
so operator recovery is a one-click reassign.
Group flip atomicity: group.network_id updates after every
per-device migration attempt has finished, regardless of partial
failure. The operator's intent is "this group is now on network B" —
stranded members aren't a reason to roll back the group itself
(they'd then have a network_id of B but a group of A, which is
a worse boundary violation than the simpler "operator-intent
metadata + Channel Scan recovery" shape). The same atomicity holds
across a multi-group submission: each resolved group flips, even if
individual members across the union failed.
Why group-granular only (per-device migration was deleted
during this consolidation): network membership is a per-group
property in the data model. Letting an individual device migrate
while its group stayed on the source network would immediately
create a cross-network-membership state — the very boundary
violation validate_group_membership rejects elsewhere. Operating
exclusively at group granularity keeps the rule consistent
end-to-end without a separate "device drifted off its group's
network" recovery path.
Network-boundary enforcement¶
racelink/domain/network_boundary.py::validate_group_membership
runs at every bulk regroup and raises
NetworkBoundaryViolation (caught at the route layer →
HTTP 400) when:
- Selected devices span multiple
network_ids (devices_span_multiple_networks), or - The target group is on a different network than the
selected devices (
group_network_mismatch).
Target group id 0 (Unconfigured) short-circuits both checks
— it's the cross-network sink for "remove from group". The
WebUI mirrors the rule client-side in the
MultiGroupPickerDialog so the operator sees disabled
checkboxes for foreign-network groups before they round-trip
the validator.
Locking Rule: Never hold state_repository.lock across RF I/O¶
The state-repository lock (state_repository.lock, surfaced as ctx.rl_lock in the web layer) is taken by:
- Web handlers that read/mutate device or group state.
- The gateway reader thread, inside
GatewayService.handle_ack_event,on_transport_event(status/identify branches), andpending_*bookkeeping.
Both paths must acquire the same lock so a request thread and the reader thread see a consistent view of the device list. That is the whole point of a single state lock .
Consequence: a handler that holds the state lock while waiting for a reply over RF will deadlock the reader. The reader thread stalls in handle_ack_event for the reply that just arrived -- and because it is stalled, it cannot pull the next USB frame out of pyserial's RX buffer. USB frames for subsequent devices queue up; the next send_and_match call times out even though the ACK is sitting unread in the OS buffer. Symptoms:
- First unicast call in a bulk returns promptly.
- Every subsequent unicast call in the same bulk times out at exactly the wait budget (e.g. 8.000 s).
- Immediately after the timeout releases the lock, a flood of queued USB events drains into the log (TX_DONE, RX window OPEN, late ACK).
The rule, therefore, is:
Never call
setNodeGroupId,sendConfig(..., wait_for_ack=True),sendRaceLink,sendGroupPreset,send_stream,discover_devices, orget_statuswhile holdingstate_repository.lock/ctx.rl_lock.
In practice this means bulk loops must release and re-acquire the lock around each iteration's RF call. See _apply_device_meta_updates in racelink/web/api.py for the reference pattern (acquire → read/mutate in-memory → release → blocking RF → repeat).
A regression test (tests/test_web_handler_helpers.py::ApplyDeviceMetaUpdatesDoesNotHoldLockAcrossBlockingIO) exercises this rule by simulating a second thread that must acquire the lock mid-bulk.
UI Scope Matrix¶
State mutations travel to the UI layer via two paths: the in-process RotorHazard UI (through on_persistence_changed → RotorHazardUIAdapter.apply_scoped_update) and the browser WebUI (through the SSE refresh channel mapped by racelink/domain/state_scope.sse_what_from_scopes). Both consume the same scope tokens so that a single save_to_db(scopes=...) call fans out consistently.
Authoritative scope tokens are defined in racelink/domain/state_scope.py:
| Token | When to use |
|---|---|
FULL |
Initial load (load_from_db) or migration boot -- rebuild everything. |
NONE |
Pure persistence, no visible change (e.g. "Save Configuration" button just flushes the combined key). |
DEVICES |
Device record changed that does not move it between groups (rename, specials struct rebuild). |
DEVICE_MEMBERSHIP |
Device moved to a different group -- affects group counts and any list embedded per group. |
DEVICE_SPECIALS |
A special config byte was written on a single device (startblock slot, etc.). No cross-UI effect on the RH panels. |
GROUPS |
Groups added / renamed / removed -- group-list-backed dropdowns must refresh. |
PRESETS |
WLED presets file or RL preset store reloaded -- preset-list-backed selects must refresh. |
RotorHazard adapter (custom_plugins/racelink_rh_plugin/plugin/ui.py) reacts as follows. Elements in the "Once" column are bootstrapped on first sync and then guarded by the _settings_panel_bootstrapped / _quickset_panel_bootstrapped flags; calling sync_rotorhazard_ui repeatedly therefore no longer produces RHUI Redefining ... log spam.
| RH UI element | Once (bootstrap) | GROUPS | DEVICES | DEVICE_MEMBERSHIP | DEVICE_SPECIALS | PRESETS |
|---|---|---|---|---|---|---|
Panel rl_settings |
✓ | |||||
Panel rl_quickset |
✓ | |||||
Option rl_device_config |
✓ | |||||
Option rl_groups_config |
✓ | |||||
Option rl_assignToNewGroup |
✓ | |||||
Quickbutton rl_btn_set_defaults |
✓ | |||||
Quickbutton rl_btn_force_groups |
✓ | |||||
Quickbutton rl_btn_get_devices |
✓ | |||||
Quickbutton rl_run_autodetect |
✓ | |||||
Option rl_quickset_brightness |
✓ | |||||
Quickbutton run_quickset |
✓ | |||||
Option rl_assignToGroup (dynamic) |
✓ | |||||
Option rl_quickset_group (dynamic) |
✓ | |||||
Option rl_quickset_preset (dynamic) |
✓ | |||||
Default ActionEffect gcaction |
✓ | ✓ | ||||
Per-capability special ActionEffects |
✓ | ✓ | ✓ | ✓ |
SSE topics (racelink/domain/state_scope.sse_what_from_scopes) drive the browser WebUI:
| Token | SSE refresh.what payload |
JS handler action |
|---|---|---|
FULL |
["groups", "devices"] |
loadGroups() + loadDevices() |
NONE |
[] |
no-op |
DEVICES |
["devices"] |
loadDevices() |
DEVICE_MEMBERSHIP |
["devices", "groups"] |
both (membership affects per-group counts) |
DEVICE_SPECIALS |
["devices"] |
loadDevices() |
GROUPS |
["groups"] |
loadGroups() |
RL_PRESETS |
["rl_presets"] |
RL-preset dropdown refresh + Specials cascade |
WLED_PRESETS |
["wled_presets"] |
WLED-preset list refresh + Specials cascade |
SCENES |
["scenes"] |
scene list refresh |
Rule of thumb for new call sites. When you call save_to_db(args, scopes=...), pick the narrowest token set describing what actually changed. If you genuinely don't know, pass {FULL} -- but prefer to refactor so you do know. The RH adapter and SSE scope map are both designed around this precision, and the regression tests in tests/test_ui_scope_routing.py (plugin) and tests/test_state_scope.py (host) pin the mapping so an accidental FULL-regression surfaces in CI.
Frontend (Vue 3 SPA)¶
The operator-facing WebUI is a Vue 3 + Vite + Pinia single-page app.
Source lives in frontend/; the Vite build emits to
racelink/static/dist/ (committed). The legacy
racelink/static/racelink.js + scenes.js (~5 500 LOC of vanilla
JS) was retired at PoC merge on 2026-04-29. frontend/README.md
is the developer-facing entry point;
frontend/POST_MIGRATION_CLEANUP.md is the open-tech-debt tracker.
Source layout¶
frontend/src/
├── App.vue — root layout: header, banners, modals, router-view
├── main.ts — Vue + Pinia + Router boot
├── router.ts — two routes; ScenesPage is lazy-loaded
├── api/
│ ├── client.ts — apiGet / apiPost / apiPut / apiDelete + base-path resolution
│ └── types.ts — hand-mirrored DTOs of racelink/web/dto.py + the API responses
├── stores/ — Pinia stores (one per server-side resource)
│ ├── gateway.ts — master / task / gateway snapshots from SSE
│ ├── devices.ts — /api/devices + filter + selection set
│ ├── groups.ts — /api/groups + selGroupId persisted in localStorage
│ ├── specials.ts — Specials schema + dialog state
│ ├── rl_presets.ts — RL presets + 14-field editor draft
│ ├── wled_presets.ts — WLED presets file registry
│ ├── scenes.ts — Scenes + draft + cost + run + tryDiscard guard
│ └── node_config.ts — OPC_CONFIG command catalogue + configByte bits
├── composables/
│ ├── useRaceLinkEvents.ts — VueUse useEventSource wrapper + named-event dispatch
│ ├── useToast.ts — singleton toast queue
│ ├── useConfirm.ts — promise-based confirm dialog (no browser popup)
│ ├── useUiBus.ts — header→page modal-open signals (counters)
│ ├── useConfigDisplay.ts — Devices Config column bit visibility
│ ├── useWledOtaSettings.ts — persisted WLED AP/OTA WiFi config
│ ├── useRlPresetVisibility.ts — A12 mode/palette slot visibility rules
│ └── useBeforeUnloadGuard.ts — sole intentional browser-popup exception
├── components/
│ ├── ui/ — shadcn-vue primitives (Button, Dialog/*)
│ ├── forms/ — schema-driven RlSpecialVarInput dispatcher + atoms
│ ├── modals/ — Discover, Re-sync, NewGroup, RL/WLED Presets, Specials, FW Update
│ └── scenes/ — Scenes-page editor (lazy chunk)
├── pages/
│ ├── DevicesPage.vue — `/`
│ └── ScenesPage.vue — `/scenes` (lazy)
└── styles/
├── tailwind.css — @theme tokens + compat aliases
└── racelink.css — surviving legacy CSS (see frontend tracker §4)
Conventions¶
- One Pinia store per server-side resource. The store mirrors
the resource shape and exposes typed CRUD actions. Editor drafts
live inside the same store as
draft: ref<Draft | null>withisDirtyderived from a JSON-stringify baseline (seeuseRlPresetsStore,useScenesStore). - One EventSource for the whole app. Set up in
composables/useRaceLinkEvents.ts, scoped toApp.vueviaonScopeDispose. Named events are bound directly viaaddEventListeneron the underlyingEventSourceinstance, not via VueUse'sdataref — VueUse dedupes consecutive payloads that serialize identically (e.g. tworefresh: {"what":["devices"]}events in quick succession), which silently dropped the second event under the legacy wiring. - No browser-native popups.
window.alert/window.prompt/window.confirmare forbidden in component code; useuseToastfor validation anduseConfirmfor confirmations. The single intentional exception isuseBeforeUnloadGuardfor Scenes-editor unsaved-changes warnings on F5 / tab close. - Navigation uses
<router-link>exclusively. The SPA shell (App.vue) never unmounts during navigation; this is what makes the 2026-04-29 SSE connection-pool stall structurally impossible. - Bundle splitting.
DevicesPageis statically imported (it's the landing route);ScenesPageis lazy-loaded. Initial bundle ≈ 121 kB gzip; Scenes lazy chunk ≈ 47 kB gzip.
Backend touchpoints¶
Every consumed REST endpoint and SSE topic is enumerated in
frontend/README.md § "Backend touchpoints". The contract on the
frontend side is:
- REST. All API calls go through
api/client.tsso the base-path resolution stays in one place. - DTOs.
api/types.tsmirrorsracelink/web/dto.py. Cleanup §13 in the frontend tracker captures the remaining hand-mirrored shapes; Phase 4 (Pydantic + TS codegen) is the long-term plan. - SSE topics.
master,task,gateway,refresh(withwhat), andscene_progress— see "UI Scope Matrix" above for the host-side topic mapping. The scene editor's per-row pip strip consumesscene_progress.
Test surface¶
Vitest unit tests cover pure logic, the action-shape adapters,
and the offset-formula evaluator parity. The TS evaluator
evaluateOffsetMs is pinned to domain/offset_formula.py via a
shared fixture (tests/fixtures/offset_formula_parity.json,
regenerated by tests/gen_offset_parity_fixture.py) read by both
test suites. Playwright E2E is deferred — see frontend/README.md
§ "On Playwright".
Repo Split History¶
This section folds in the content of the former docs/repo_split_map.md
(retained in the source repository) for completeness.
Host-Owned Import Edge¶
These entry points stay in RaceLink_Host and are the supported
surface for external adapters:
racelink.app:create_runtimeracelink.web:register_racelink_webracelink.web:RaceLinkWebRuntime
Already moved out of Host¶
The following paths used to live in this repository and now belong in
the separate RaceLink_RH-plugin repository:
| Previous Host path | Target in plugin repo | Note |
|---|---|---|
__init__.py |
plugin repo root __init__.py |
RotorHazard loader shim now belongs with the plugin |
racelink/integrations/rotorhazard/__init__.py |
racelink_rh_plugin/integrations/rotorhazard/__init__.py |
Plugin package edge |
racelink/integrations/rotorhazard/plugin.py |
racelink_rh_plugin/integrations/rotorhazard/plugin.py |
Adapter bootstrap for RH |
racelink/integrations/rotorhazard/ui.py |
racelink_rh_plugin/integrations/rotorhazard/ui.py |
RotorHazard UI adapter |
racelink/integrations/rotorhazard/actions.py |
racelink_rh_plugin/integrations/rotorhazard/actions.py |
RH action registration |
racelink/integrations/rotorhazard/dataio.py |
racelink_rh_plugin/integrations/rotorhazard/dataio.py |
RH import/export adapter |
racelink/integrations/rotorhazard/source.py |
racelink_rh_plugin/integrations/rotorhazard/source.py |
RH event source adapter |
Files that stay in Host¶
| Host path | Why it stays |
|---|---|
racelink/app.py |
Owns the host runtime factory and service wiring |
racelink/web/** |
Owns the shared RaceLink WebUI registration, API, SSE, task state |
racelink/integrations/standalone/** |
Hosts the standalone Flask mode |
racelink/pages/** (now empty) and racelink/static/** |
Shared WebUI assets — racelink/static/dist/ carries the committed Vite build of the Vue SPA |
frontend/** |
Vue 3 + Vite source for the SPA (built into racelink/static/dist/) |
controller.py |
Host controller and runtime coordinator |
Migration history¶
Stage-by-stage rollout history for the §"Multi-Transport runtime" features above — kept as an appendix so the body of that section reads as the current contract, not as a chronicle. Detailed commit-level breakdowns (SHAs, test counts, per-part commits) live in the maintainer-internal engineering ledger.
- Stage 0 / 1 / 1.5 (pre-2026-05-21) — pre-sync helpers, the
additive wire-protocol surface (
OPC_RF_CONFIG/OPC_GET_RF_CONFIG/GW_CMD_*_RF_CONFIG/EV_RF_CHANGED), single-gateway onboarding. Single-gateway hosts shipped first; multi-gateway support was additive. - Stage 2 (2026-05-21 release) — host-side multi-transport
runtime. Schema v2 adds
RL_Networkwithnetwork_idon every device + group;controller._transportsbecomes a list with the legacycontroller.transportslot as a property over_transports[0]. New routing helperstransport_for_network/transport_for_device/transport_for_group;PendingMatcher.gateway_idfilter field;GatewaySerialTransport.enumerate_all()boot path. Idempotent v1→v2 persistence migration; single-gateway UX byte-identical to the pre-Stage-2 host. - Stage 3 (same release, seven Parts A-G) — channels, policy,
bind state machine, migration, fan-out. Part A: the shipped
channel tables + the
validate_networks_separationvalidator. Part B: hard network-boundary enforcement on bulk regroups. Part C:PendingMatcher.gateway_idbecomes required for concrete-sender matchers. Part D: gateway-bind state machine (pending/bound/conflict/unbound). Part E: the four-phase RF migration engine. Part F: Channel-Scan service. Part G: cross-network fan-out — the implicit "all attached" default that this section's anchor still carries in its slug. - Stage 4 (same release, three frontend blocks) — Foundation
(network store + DeviceTable column + sidebar filter),
Wizards (
GatewayBindWizard+ChannelScanDialog), Network Manager + Setup-Change Assistant + scene picker. - Stage 5 (same release) — documentation pass. The §"Topic →
Document" tables in
STRUCTURE.mdwere extended in this stage to cover the new docs. - 2026-05-22 reconnect-hardening pass — six bench-test rounds
on a Pi against two physical gateways. Surgical fixes to the
per-transport detach path,
schedule_reconnectgraceful fallback,enumerate_all(exclude_ports=...),_attach_transportidempotency, gateway labels in debug logs. - 2026-05 unreleased: BroadcastTarget refactor — the
cross-network fan-out section above is the post-refactor
contract. Replaced the implicit Stage-3-Part-G "all attached"
default with an explicit
BroadcastTargetscope object, added the threadedbroadcast_fanouthelper (parallel airtime instead of summed), and introducedscene_network_ids(scene, controller)so the runner can scopesyncactions to the networks the scene actually touches. A later iteration on the same branch added the operator-pinnedscene.network_scopefield with Auto / Explicit modes.