Skip to content

fix(esp32): use runtime node_id from NVS in outgoing packets#232

Open
melodykke wants to merge 2 commits intoruvnet:mainfrom
melodykke:fix/runtime-node-id
Open

fix(esp32): use runtime node_id from NVS in outgoing packets#232
melodykke wants to merge 2 commits intoruvnet:mainfrom
melodykke:fix/runtime-node-id

Conversation

@melodykke
Copy link
Copy Markdown

Summary

Fix an ESP32 firmware issue where outgoing packets could use the compile-time node ID instead of the runtime node ID loaded from NVS.

In multi-node setups, this could cause a provisioned board to still appear with the wrong node_id on the receiver side.

Root Cause

Some packet serialization paths were still using CONFIG_CSI_NODE_ID instead of g_nvs_config.node_id.

That meant NVS provisioning could succeed while transmitted packets still carried the compile-time default.

Changes

Use g_nvs_config.node_id consistently in outgoing packet paths.

Validation

Tested with a real ESP32-S3 setup.

  • provisioned a board with node_id=2 via NVS
  • rebuilt and reflashed firmware
  • confirmed RuView then reported node_id=2 correctly

Impact

This makes runtime node provisioning behave correctly for multi-node deployments.

- 为 Rust sensing server 增加空间布局配置入口,支持加载节点位置与语义区域定义,并扩展空间融合解释输出。

- 在 Docker 镜像中打包 config 目录,确保部署后可直接读取 spatial-layout.json。

- 修正 ESP32 显示界面中的节点编号来源,改为使用运行时 NVS 配置而非编译期常量。
Copy link
Copy Markdown
Owner

@ruvnet ruvnet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review — PR #232

Recommendation: HOLD (partially superseded, scope creep)

Summary

This PR from @melodykke has two distinct parts:

Part 1 — Firmware: NVS runtime node_id (the stated purpose)
Replaces CONFIG_CSI_NODE_ID with g_nvs_config.node_id across 4 firmware files:

  • csi_collector.c — packet serialization
  • display_ui.c — UI node label
  • edge_processing.c — compressed frame and vitals packets
  • wasm_runtime.c — WASM output packets

Part 2 — Sensing server: Multi-node spatial fusion (~800 lines, unstated)
Adds an entire spatial fusion system to the sensing server:

  • SpatialLayoutConfig with JSON node positions and zone definitions
  • Per-node Esp32NodeState with independent signal processing pipelines
  • Zone-based presence/motion/vital scoring with cross-node consensus
  • FusionConsensus and FusionExplanation structs
  • config/spatial-layout.json spatial layout file
  • Docker config update

Security Assessment

  • No security vulnerabilities found. The extern nvs_config_t g_nvs_config pattern is safe — it is initialized before use in app_main.
  • Path concern: spatial-layout.json is loaded from disk via --spatial-config flag — no user-facing path traversal risk since it is a CLI argument, not a web endpoint.
  • The adaptive_override refactor correctly decouples from AppStateInner, improving testability.

Conflict Assessment (ADR-069 through ADR-078)

  • Part 1 (NVS node_id) is already on main. Commit 8a84748a8 ("fix(firmware): use NVS node_id instead of Kconfig constant") already made these exact same changes. Merging would conflict.
  • Part 2 (spatial fusion) will conflict heavily with sensing-server/src/main.rs on main, which has had significant changes through ADR-069 (Cognitum pipeline), ADR-070+, and the recent v0.5.3 cross-node fusion work.

Code Quality Notes (Part 2)

  • Good: Per-node state isolation prevents cross-contamination of temporal features between ESP32 nodes.
  • Good: Zone scoring with geometric weighting is a sound approach.
  • Concern: ~800 lines of new code added to an already large main.rs (3000+ lines). This should be extracted into separate modules (spatial.rs, fusion.rs).
  • Concern: The fuse_esp32_update function is doing too much — it combines filtering, scoring, consensus, and explanation generation. Should be decomposed.

Recommendation

Part 1 is already merged. Part 2 introduces valuable multi-node fusion but:

  1. Conflicts extensively with current main
  2. Needs to be rebased onto current main
  3. Should be split into a separate PR with proper module decomposition
  4. Should be reviewed against our existing cross-node fusion work (v0.5.3, ADR-069)

Verdict: HOLD — Ask contributor to rebase and split the sensing-server changes into a separate PR.

@ruvnet
Copy link
Copy Markdown
Owner

ruvnet commented Apr 3, 2026

Thanks @melodykke — the NVS node_id fix is already on main (8a84748). The multi-node spatial fusion concept is interesting but conflicts extensively with v0.5.3-v0.5.5 changes (cross-node fusion, ADR-069 Seed pipeline, ADR-075 MinCut person counting). Could you rebase and split the spatial fusion into a separate PR targeting the sensing-server only? Happy to review a rebased version.

ruvnet added a commit that referenced this pull request Apr 15, 2026
Users on multi-node ESP32 deployments have been reporting for months
that their provisioned `node_id` reverts to the Kconfig default of `1`
in UDP frames and the `csi_collector` init log, despite boot showing:

    nvs_config: NVS override: node_id=4
    main: ESP32-S3 CSI Node (ADR-018) - Node ID: 4
    csi_collector: CSI collection initialized (node_id=1, channel=11)

See #232, #375, #385, #386, #390. The root memory-corruption path for
the `g_nvs_config.node_id` byte has not been definitively isolated
(does not reproduce on my attached ESP32-S3 running current source
and the v0.6.0 release binary), but the UDP frame header can be made
tamper-proof regardless:

1. `csi_collector_init()` now captures `g_nvs_config.node_id` into a
   module-local `static uint8_t s_node_id` at init time.
2. `csi_serialize_frame()` reads `buf[4]` from `s_node_id`, not from
   the global - so any later corruption of `g_nvs_config` cannot
   affect outgoing CSI frames.
3. All other consumers (`edge_processing.c` x3, `wasm_runtime.c`,
   `display_ui.c`, `main.c swarm_bridge_init`) now go through a new
   `csi_collector_get_node_id()` accessor instead of reading the
   global directly.
4. A canary at end-of-init logs `WARN` if `g_nvs_config.node_id`
   already diverges from the captured value - this will pinpoint
   the corruption path if it happens on a user's device.

Hardware validation on attached ESP32-S3 (COM8):
  - NVS loads node_id=2
  - Boot log: `main: ... Node ID: 2`
  - NEW log: `csi_collector: Captured node_id=2 at init (defensive
    copy for #232/#375/#385/#390)`
  - Init log: `csi_collector: CSI collection initialized (node_id=2)`
  - UDP frame byte[4] = 2 (verified via socket sniffer, 15/15 packets)

This is defense in depth - it shields the UDP frame from whatever
upstream bug is clobbering the struct. When a user hits the original
bug, the canary WARN will help isolate the root cause.

Refs #232 #375 #385 #386 #390

Co-Authored-By: claude-flow <ruv@ruv.net>
proffesor-for-testing added a commit to proffesor-for-testing/RuView that referenced this pull request Apr 16, 2026
The CSI callback reads g_nvs_config.filter_mac_set and filter_mac on
every invocation (100-500 Hz). If wifi_init_sta() corrupts g_nvs_config
(same root cause as the node_id clobber), the callback reads garbage
from the struct, leading to Core 0 LoadProhibited panic after ~2400
callbacks (~70 seconds of operation).

Extends the early-capture pattern from the node_id fix to also copy
filter_mac_set and filter_mac into module-local statics before WiFi
init runs. Adds canary logging to detect filter_mac corruption.

Observed on device 80:b5:4e:c1:be:b8 via serial:
  CSI cb #2400 → Guru Meditation Error: Core 0 panic'ed (LoadProhibited)
  → TG0WDT_SYS_RST → reboot → crash again at ~2900 callbacks

Refs ruvnet#232 ruvnet#375 ruvnet#385 ruvnet#386 ruvnet#390

Co-Authored-By: Ruflo & AQE
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants