fix(esp32): use runtime node_id from NVS in outgoing packets#232
fix(esp32): use runtime node_id from NVS in outgoing packets#232melodykke wants to merge 2 commits intoruvnet:mainfrom
Conversation
- 为 Rust sensing server 增加空间布局配置入口,支持加载节点位置与语义区域定义,并扩展空间融合解释输出。 - 在 Docker 镜像中打包 config 目录,确保部署后可直接读取 spatial-layout.json。 - 修正 ESP32 显示界面中的节点编号来源,改为使用运行时 NVS 配置而非编译期常量。
ruvnet
left a comment
There was a problem hiding this comment.
Code Review — PR #232
Recommendation: HOLD (partially superseded, scope creep)
Summary
This PR from @melodykke has two distinct parts:
Part 1 — Firmware: NVS runtime node_id (the stated purpose)
Replaces CONFIG_CSI_NODE_ID with g_nvs_config.node_id across 4 firmware files:
csi_collector.c— packet serializationdisplay_ui.c— UI node labeledge_processing.c— compressed frame and vitals packetswasm_runtime.c— WASM output packets
Part 2 — Sensing server: Multi-node spatial fusion (~800 lines, unstated)
Adds an entire spatial fusion system to the sensing server:
SpatialLayoutConfigwith JSON node positions and zone definitions- Per-node
Esp32NodeStatewith independent signal processing pipelines - Zone-based presence/motion/vital scoring with cross-node consensus
FusionConsensusandFusionExplanationstructsconfig/spatial-layout.jsonspatial layout file- Docker config update
Security Assessment
- No security vulnerabilities found. The
extern nvs_config_t g_nvs_configpattern is safe — it is initialized before use inapp_main. - Path concern:
spatial-layout.jsonis loaded from disk via--spatial-configflag — no user-facing path traversal risk since it is a CLI argument, not a web endpoint. - The
adaptive_overriderefactor correctly decouples fromAppStateInner, improving testability.
Conflict Assessment (ADR-069 through ADR-078)
- Part 1 (NVS node_id) is already on main. Commit
8a84748a8("fix(firmware): use NVS node_id instead of Kconfig constant") already made these exact same changes. Merging would conflict. - Part 2 (spatial fusion) will conflict heavily with
sensing-server/src/main.rson main, which has had significant changes through ADR-069 (Cognitum pipeline), ADR-070+, and the recentv0.5.3cross-node fusion work.
Code Quality Notes (Part 2)
- Good: Per-node state isolation prevents cross-contamination of temporal features between ESP32 nodes.
- Good: Zone scoring with geometric weighting is a sound approach.
- Concern: ~800 lines of new code added to an already large
main.rs(3000+ lines). This should be extracted into separate modules (spatial.rs,fusion.rs). - Concern: The
fuse_esp32_updatefunction is doing too much — it combines filtering, scoring, consensus, and explanation generation. Should be decomposed.
Recommendation
Part 1 is already merged. Part 2 introduces valuable multi-node fusion but:
- Conflicts extensively with current main
- Needs to be rebased onto current main
- Should be split into a separate PR with proper module decomposition
- Should be reviewed against our existing cross-node fusion work (v0.5.3, ADR-069)
Verdict: HOLD — Ask contributor to rebase and split the sensing-server changes into a separate PR.
|
Thanks @melodykke — the NVS node_id fix is already on main (8a84748). The multi-node spatial fusion concept is interesting but conflicts extensively with v0.5.3-v0.5.5 changes (cross-node fusion, ADR-069 Seed pipeline, ADR-075 MinCut person counting). Could you rebase and split the spatial fusion into a separate PR targeting the sensing-server only? Happy to review a rebased version. |
Users on multi-node ESP32 deployments have been reporting for months
that their provisioned `node_id` reverts to the Kconfig default of `1`
in UDP frames and the `csi_collector` init log, despite boot showing:
nvs_config: NVS override: node_id=4
main: ESP32-S3 CSI Node (ADR-018) - Node ID: 4
csi_collector: CSI collection initialized (node_id=1, channel=11)
See #232, #375, #385, #386, #390. The root memory-corruption path for
the `g_nvs_config.node_id` byte has not been definitively isolated
(does not reproduce on my attached ESP32-S3 running current source
and the v0.6.0 release binary), but the UDP frame header can be made
tamper-proof regardless:
1. `csi_collector_init()` now captures `g_nvs_config.node_id` into a
module-local `static uint8_t s_node_id` at init time.
2. `csi_serialize_frame()` reads `buf[4]` from `s_node_id`, not from
the global - so any later corruption of `g_nvs_config` cannot
affect outgoing CSI frames.
3. All other consumers (`edge_processing.c` x3, `wasm_runtime.c`,
`display_ui.c`, `main.c swarm_bridge_init`) now go through a new
`csi_collector_get_node_id()` accessor instead of reading the
global directly.
4. A canary at end-of-init logs `WARN` if `g_nvs_config.node_id`
already diverges from the captured value - this will pinpoint
the corruption path if it happens on a user's device.
Hardware validation on attached ESP32-S3 (COM8):
- NVS loads node_id=2
- Boot log: `main: ... Node ID: 2`
- NEW log: `csi_collector: Captured node_id=2 at init (defensive
copy for #232/#375/#385/#390)`
- Init log: `csi_collector: CSI collection initialized (node_id=2)`
- UDP frame byte[4] = 2 (verified via socket sniffer, 15/15 packets)
This is defense in depth - it shields the UDP frame from whatever
upstream bug is clobbering the struct. When a user hits the original
bug, the canary WARN will help isolate the root cause.
Refs #232 #375 #385 #386 #390
Co-Authored-By: claude-flow <ruv@ruv.net>
The CSI callback reads g_nvs_config.filter_mac_set and filter_mac on every invocation (100-500 Hz). If wifi_init_sta() corrupts g_nvs_config (same root cause as the node_id clobber), the callback reads garbage from the struct, leading to Core 0 LoadProhibited panic after ~2400 callbacks (~70 seconds of operation). Extends the early-capture pattern from the node_id fix to also copy filter_mac_set and filter_mac into module-local statics before WiFi init runs. Adds canary logging to detect filter_mac corruption. Observed on device 80:b5:4e:c1:be:b8 via serial: CSI cb #2400 → Guru Meditation Error: Core 0 panic'ed (LoadProhibited) → TG0WDT_SYS_RST → reboot → crash again at ~2900 callbacks Refs ruvnet#232 ruvnet#375 ruvnet#385 ruvnet#386 ruvnet#390 Co-Authored-By: Ruflo & AQE
Summary
Fix an ESP32 firmware issue where outgoing packets could use the compile-time node ID instead of the runtime node ID loaded from NVS.
In multi-node setups, this could cause a provisioned board to still appear with the wrong
node_idon the receiver side.Root Cause
Some packet serialization paths were still using
CONFIG_CSI_NODE_IDinstead ofg_nvs_config.node_id.That meant NVS provisioning could succeed while transmitted packets still carried the compile-time default.
Changes
Use
g_nvs_config.node_idconsistently in outgoing packet paths.Validation
Tested with a real ESP32-S3 setup.
node_id=2via NVSnode_id=2correctlyImpact
This makes runtime node provisioning behave correctly for multi-node deployments.