feat - refactor nand flash as ring buffer#7216
Conversation
Greptile SummaryThis PR replaces the LittleFS file-based offline audio storage with a raw-sector ring buffer written directly to NAND flash sectors. Audio packets are batched into 16 KB aligned sectors, metadata is wear-leveled across 64 dedicated sectors using a generation counter, and the BLE storage service is updated to speak the new sequence-number-based protocol (
Confidence Score: 3/5The core ring-buffer mechanics are well-structured, but silent audio loss and a service starvation path on timeout remain unresolved. The previously-flagged silent discard of all pre-RTC-sync audio is unchanged. A timeout in any sd_ring_* blocking call leaves the in-flight flag set until the SD worker drains the request, during which all callers receive -EBUSY with no log visibility. The get_oldest_packet_name TOCTOU between sd_ring_get_info and sd_ring_read can produce spurious errors on the BLE file-list path. These issues sit on the core data-path of a newly rewritten storage layer. omi/firmware/omi/src/sd_card.c needs the most attention: the in-flight timeout handling, the get_oldest_packet_name TOCTOU, and the no-RTC discard path. Important Files Changed
Reviews (2): Last reviewed commit: "improv - the audio sync speed increases ..." | Re-trigger Greptile |
| done_pending = true; | ||
| remaining_packets = 0; |
There was a problem hiding this comment.
|
@greptile-apps review |
| static int get_oldest_packet_name(char *buf, size_t buf_size) | ||
| { | ||
| if (!filenames || !count || max_files <= 0) | ||
| if (!buf || buf_size == 0U) { | ||
| return -EINVAL; | ||
| if (!is_mounted) | ||
| return -ENODEV; | ||
| } | ||
|
|
||
| int cache_res = ensure_file_cache(); | ||
| if (cache_res < 0) { | ||
| return cache_res; | ||
| sd_ring_info_t info; | ||
| int ret = sd_ring_get_info(&info); | ||
| if (ret < 0) { | ||
| return ret; | ||
| } | ||
|
|
||
| int file_count = cached_file_list_count; | ||
| if (file_count > max_files) { | ||
| file_count = max_files; | ||
| if (info.read_seq == info.write_seq) { | ||
| buf[0] = '\0'; | ||
| return 0; | ||
| } | ||
|
|
||
| for (int i = 0; i < file_count; i++) { | ||
| strncpy(filenames[i], cached_file_names[i], MAX_FILENAME_LEN - 1); | ||
| filenames[i][MAX_FILENAME_LEN - 1] = '\0'; | ||
| if (sizes) { | ||
| sizes[i] = cached_file_sizes[i]; | ||
| } | ||
| uint8_t packet[RAW_AUDIO_PACKET_BYTES]; | ||
| uint32_t bytes_read = 0; | ||
| uint32_t packets_read = 0; | ||
| ret = sd_ring_read(info.read_seq, packet, sizeof(packet), &bytes_read, &packets_read); | ||
| if (ret < 0 || packets_read == 0U) { | ||
| return (ret < 0) ? ret : -ENOENT; | ||
| } | ||
|
|
||
| *count = file_count; | ||
| format_timestamp_name(sys_get_be32(packet), buf, buf_size); | ||
| return 0; | ||
| } |
There was a problem hiding this comment.
Stale
info.read_seq used in subsequent sd_ring_read call
get_oldest_packet_name snapshots info.read_seq via sd_ring_get_info, then issues a separate sd_ring_read(info.read_seq, ...). Between these two blocking worker round-trips, an external caller can issue CMD_RING_ADVANCE which updates ring_state.read_seq. If that happens, info.read_seq is now behind the current ring_state.read_seq, and read_packets_internal returns -ERANGE. The caller (get_audio_file_list_with_sizes) propagates this error, causing the file-list operation to fail silently from the BLE client's perspective. The fix is to either use a single atomic worker operation that returns both the oldest seq and its packet, or to handle -ERANGE by retrying with a fresh sd_ring_get_info.
| if (k_sem_take(&resp.sem, K_MSEC(5000)) != 0) { | ||
| return -ETIMEDOUT; | ||
| } | ||
|
|
||
| *info = resp.info; | ||
| return resp.res; |
There was a problem hiding this comment.
sd_ring_get_info leaves info_in_flight set after timeout, silently starving all callers
When k_sem_take times out, the function returns -ETIMEDOUT without clearing info_in_flight. info_in_flight is only cleared by the SD worker via release_resp_busy when it eventually processes the enqueued request. If the SD worker is stalled, every subsequent call to sd_ring_get_info will immediately return -EBUSY for the full duration — potentially freezing the entire storage service. The same pattern applies to sd_ring_read, sd_ring_advance, and sd_ring_clear. At minimum, the timeout branch should log a warning so operators can detect this condition.
## Summary App-side support for the ring-buffer NAND storage protocol introduced in firmware PR #7216 (TuEmb). Devices on fw >= 3.0.20 will use this path; fw 3.0.17–.19 keep the existing multi-file LittleFS path; fw < 3.0.17 stay on the legacy SD-card path. The 444-byte record format (`[ts:4 BE][audio:440]`) is identical to the multi-file protocol on main — only the control plane changes (read by u64 sequence, advance by sequence). The audio parser is reused unchanged. ## Commits (atomic) 1. Add `RingStatus` / `RingInfo` data classes + abstract method surface on `DeviceConnection` 2. Implement ring-buffer BLE protocol on `OmiDeviceConnection` (CMD_RING_INFO/READ/ADVANCE/CLEAR + NOTIFY_INFO/ACK/READ_BEGIN/DATA/DONE handling) 3. Add `RingStorageSyncImpl` mirroring `StorageSyncImpl` but talking ring (NOTIFY_DATA reassembly across BLE chunks, advance only after LocalWalSync confirms chunk landed) 4. Gate Phase-0 sync in `WalSyncs.syncAll()` by firmware version; route `_checkAndStartAutoSync` through `getRingStatus()` for ring-capable devices 5. Refactor — extract pure-data wire helpers into `RingProtocol` so they're testable without BLE mocking 6. 30 unit tests covering status / NOTIFY_INFO / NOTIFY_DONE / NOTIFY_READ_BEGIN parsers, CMD_RING_READ / CMD_RING_ADVANCE u64 BE encoding, packed audio payload framing, and the BLE chunk reassembler ## Data-safety invariant `CMD_RING_ADVANCE` is sent only after `NOTIFY_DONE` arrives with status=0 **and** every chunk has been registered with `LocalWalSync`. Any failure (cancel, BLE drop, flush error, non-zero DONE status) leaves the ring untouched — the next sync resumes from the same `read_seq`. This is the data-loss win versus the file-based protocol's delete-after-read semantics. ## RTC handling If `status.rtc_valid == 0` (device RTC not yet synced via `performSyncTime`), per-record timestamps are treated as unreliable and timerStart falls back to `now - estimated_duration`, matching the legacy code path. If `rtc_valid == 1`, the first record's BE u32 timestamp anchors timerStart. ## Out of scope - WiFi sync path — unchanged (no longer in user-facing use) - Backend / cloud upload — `LocalWalSync` is reused as-is; no API changes ## Test plan - [x] Local: `cd app && flutter test test/unit/ring_protocol_test.dart` — 30/30 pass - [x] Local: full `flutter analyze` — 0 new issues introduced (pre-existing warnings on main untouched) - [x] Dev node: connect a fw 3.0.20 device, verify auto-sync banner triggers on connect when ring has unread packets, verify download completes and ring advances - [x] Dev node: cancel sync mid-stream, verify ring is NOT advanced, verify next sync resumes from same `read_seq` - [ ] Dev node: connect a fw 3.0.17–.19 device (existing multi-file path), verify behavior unchanged - [ ] Dev node: connect a fw < 3.0.17 device (legacy SD card path), verify behavior unchanged Closes nothing yet (firmware PR #7216 is the dependency).
…viceConnection Adds data classes and abstract method surface for the ring-buffer storage protocol (firmware 3.0.20+, omi PR BasedHardware#7216). Same BLE service UUID as the existing multi-file protocol; subclasses fill in the new opcodes. The 444-byte record format ([ts:4 BE][audio:440]) is identical to the multi-file protocol on main, so the audio parser will be reused. Only the control-plane (read-by-sequence, advance-by-sequence) is new.
Mirrors StorageSyncImpl but talks the ring protocol from omi PR BasedHardware#7216: - refreshWalsFromDevice: 16B status read; emits ONE virtual Wal covering the entire unread region (the ring is a single logical stream, not a list of files). Skipped when <10s of audio, mirroring the file-based threshold. - _syncRing: snapshots ring state via getRingInfo, sends CMD_RING_READ, demuxes notifications by opcode (ACK/INFO/DATA/DONE/READ_BEGIN), and reassembles the unaligned NOTIFY_DATA byte stream into 444B records. Per record, strips the 4B BE timestamp and feeds the 440B audio payload through the same packed [size:1][frame:size] parser used by the multi-file path. - Chunks flush to disk + register with LocalWalSync as we go (60s chunks per sdcardChunkSizeSecs) so a mid-stream BLE drop still lands the early audio in the cloud. - CMD_RING_ADVANCE is sent ONLY after NOTIFY_DONE arrives with status=0 AND every chunk has been registered with LocalWalSync. Any failure (cancel, BLE drop, flush error, DONE with non-zero status) leaves the ring untouched — the next sync resumes from the same read_seq. This is the data-safety invariant. - rtc_valid handling: if the device RTC isn't synced yet, fall back to now-duration like the legacy code does. Otherwise the per-record timestamp from the first record anchors timerStart. - dropped_packets > 0 surfaces as a DebugLogManager warning; sync proceeds with whatever the firmware still has.
|
Closing as stale after confirmed stale-close triage. This PR was identified as high-confidence stale due to age plus one or more of: conflicts, draft/failing status, superseded work, or obsolete architecture. No merge was performed. If the work is still needed, please reopen with a fresh rebase or start from current main. |
|
Hey @TuEmb 👋 Thank you so much for taking the time to contribute to Omi! We truly appreciate you putting in the effort to submit this pull request. After careful review, we've decided not to merge this particular PR. Please don't take this personally — we genuinely try to merge as many contributions as possible, but sometimes we have to make tough calls based on:
Your contribution is still valuable to us, and we'd love to see you contribute again in the future! If you'd like feedback on how to improve this PR or want to discuss alternative approaches, please don't hesitate to reach out. Thank you for being part of the Omi community! 💜 |
Core Idea
444 bytes:0..3: UTC timestamp,uint32 BE4..443: audio payload,440 bytesCMD_RING_ADVANCEpersists consumer progress.GATT
30295780-4301-EABD-2904-2849ADFEAE4330295781-4301-EABD-2904-2849ADFEAE43Write + Notify30295782-4301-EABD-2904-2849ADFEAE43Read + NotifySubscribe to notifications on the Control characteristic.
Status Char Read
Returns
16 bytes=4 x uint32 LE:0..3:used_bytes4..7:unread_packets8..11:free_bytes12..15:rtc_validCommands
Write commands to the Control characteristic.
0x10CMD_RING_INFO[0x10]0x11CMD_RING_READ[0x11][start_seq: u64 BE][0x11][start_seq: u64 BE][packet_count: u32 BE]packet_countis omitted or0, firmware streams everything available fromstart_seq0x12CMD_RING_ADVANCE[0x12][new_read_seq: u64 BE]0x13CMD_RING_CLEAR[0x13]0x03CMD_STOP_SYNC[0x03]Notifications
All notifications are sent on the Control characteristic.
0x01NOTIFY_ACK[0x01][status]0x02NOTIFY_INFO[0x02][read_seq: u64 BE][write_seq: u64 BE][capacity_packets: u32 BE][dropped_packets: u64 BE][packet_size: u16 BE]packet_size = 4440x05NOTIFY_READ_BEGIN[0x05][transfer_start_seq: u64 BE][packet_count: u32 BE]0x03NOTIFY_DATA[0x03][raw_bytes...]444-byterecords0x04NOTIFY_DONE[0x04][status][next_seq: u64 BE]next_seqinCMD_RING_ADVANCEStatus Codes
0: success6: invalid command9: storage not ready10: sequence out of rangeImportant Semantics
read_seq= oldest unread packetwrite_seq= next sequence after newest committed packetread_seqdropped_packetsstart_seqolder than currentread_seq, firmware returns10Recommended App Flow
CMD_RING_INFO.NOTIFY_INFO.CMD_RING_READ(read_seq).NOTIFY_READ_BEGIN.NOTIFY_DATAchunks and rebuild444-byterecords.NOTIFY_DONE.CMD_RING_ADVANCE(next_seq).read_seq == write_seq.Migration Note
Old flow was file-based:
New flow is ring-based: