Skip to content

Logging improvement for repl dev#679

Merged
yuwmao merged 5 commits into
eBay:masterfrom
xiaoxichen:logging
Apr 3, 2025
Merged

Logging improvement for repl dev#679
yuwmao merged 5 commits into
eBay:masterfrom
xiaoxichen:logging

Conversation

@xiaoxichen
Copy link
Copy Markdown
Collaborator

@xiaoxichen xiaoxichen commented Apr 2, 2025

The general idea is application (user of HS) can optionally pass in a trace_id, now it is a uint64. The trace_id will be included in all loggings relate with this rreq , and the trace_id will be pushed to follower through both raft channel and data channel.

With this , when application see an issue , with the trace_id, we can use a single search pattern to search across all replicas and get all related log for that request.

Note this is a breaking change as we change raft journal format.

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Apr 2, 2025

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

Attention: Patch coverage is 28.57143% with 100 lines in your changes missing coverage. Please review.

Project coverage is 65.99%. Comparing base (1a0cef8) to head (65fb0b9).
Report is 157 commits behind head on master.

Files with missing lines Patch % Lines
src/lib/replication/repl_dev/raft_repl_dev.cpp 25.92% 78 Missing and 2 partials ⚠️
...rc/lib/replication/repl_dev/raft_state_machine.cpp 21.42% 11 Missing ⚠️
src/lib/replication/log_store/repl_log_store.cpp 0.00% 3 Missing ⚠️
src/lib/replication/repl_dev/solo_repl_dev.cpp 0.00% 3 Missing ⚠️
src/lib/replication/repl_dev/common.cpp 33.33% 2 Missing ⚠️
src/lib/replication/repl_dev/raft_repl_dev.h 66.66% 1 Missing ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #679      +/-   ##
==========================================
+ Coverage   56.51%   65.99%   +9.48%     
==========================================
  Files         108      109       +1     
  Lines       10300    11369    +1069     
  Branches     1402     1557     +155     
==========================================
+ Hits         5821     7503    +1682     
+ Misses       3894     3127     -767     
- Partials      585      739     +154     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@xiaoxichen xiaoxichen force-pushed the logging branch 8 times, most recently from 4afa717 to b8930ef Compare April 2, 2025 09:13
@xiaoxichen xiaoxichen changed the title [DNM] Logging Logging improvement for repl dev Apr 2, 2025
rdev name (e.g rdev1) is human friendly but not visible outside of the
Homestore, only group_id.

Signed-off-by: Xiaoxi Chen <xiaoxchen@ebay.com>
the traceID replicate to follower through log and data
channel(push_data). The target is we can use single
traceID to get logs across replicas for the request.

Signed-off-by: Xiaoxi Chen <xiaoxchen@ebay.com>
Also adjust some of the logging level.

Signed-off-by: Xiaoxi Chen <xiaoxchen@ebay.com>
Signed-off-by: Xiaoxi Chen <xiaoxchen@ebay.com>
@xiaoxichen
Copy link
Copy Markdown
Collaborator Author

sample log

test_raft_repl_dev_replica_0_log:[04/02/25 01:44:03-07:00] [D] [test_raft_repl_dev_replica_0] [239099] [raft_repl_dev.cpp:331:async_alloc_write] [traceID=16] [rdev1:b7083ba1-2b69-4752-bb66-73320369d03c] repl_key [server=1203793269, term=3, dsn=16, hash=1203793254], header size [24] bytes, user_key size [8] bytes, data size [4096] bytes
test_raft_repl_dev_replica_0_log:[04/02/25 01:44:03-07:00] [D] [test_raft_repl_dev_replica_0] [239099] [raft_repl_dev.cpp:415:push_data_to_all_followers] [traceID=16] [rdev1:b7083ba1-2b69-4752-bb66-73320369d03c] Data Channel: Pushing data to follower 00010123-4567-89ab-cdef-0123456789ab, rreq=[repl_key=[server=1203793269, term=3, dsn=16, hash=1203793254], lsn=-1 state=[BLK_ALLOCATED | ] m_headersize=24 m_keysize=8 is_proposer=true local_blkid=[{blk#=163843 count=1 chunk=3},] remote_blkid=[]]
test_raft_repl_dev_replica_0_log:[04/02/25 01:44:03-07:00] [D] [test_raft_repl_dev_replica_0] [239099] [raft_repl_dev.cpp:415:push_data_to_all_followers] [traceID=16] [rdev1:b7083ba1-2b69-4752-bb66-73320369d03c] Data Channel: Pushing data to follower 00020123-4567-89ab-cdef-0123456789ab, rreq=[repl_key=[server=1203793269, term=3, dsn=16, hash=1203793254], lsn=-1 state=[BLK_ALLOCATED | ] m_headersize=24 m_keysize=8 is_proposer=true local_blkid=[{blk#=163843 count=1 chunk=3},] remote_blkid=[]]
test_raft_repl_dev_replica_0_log:[04/02/25 01:44:03-07:00] [D] [test_raft_repl_dev_replica_0] [238970] [raft_repl_dev.cpp:431:operator()] [traceID=16] [rdev1:b7083ba1-2b69-4752-bb66-73320369d03c] Data Channel: Data push completed for rreq=[dsn=16 term=3 lsn=-1 op=HS_DATA_LINKED local_blkid=[{blk#=163843 count=1 chunk=3},] state=[BLK_ALLOCATED | ]]
test_raft_repl_dev_replica_0_log:[04/02/25 01:44:03-07:00] [D] [test_raft_repl_dev_replica_0] [239142] [raft_repl_dev.cpp:1000:handle_commit] [traceID=16] [rdev1:b7083ba1-2b69-4752-bb66-73320369d03c] Raft channel: Commit rreq=[dsn=16 term=3 lsn=23 op=HS_DATA_LINKED local_blkid=[{blk#=163843 count=1 chunk=3},] state=[BLK_ALLOCATED | LOG_RECEIVED | LOG_FLUSHED]]
test_raft_repl_dev_replica_1_log:[04/02/25 01:44:03-07:00] [D] [test_raft_repl_dev_replica_1] [239116] [raft_repl_dev.cpp:465:on_push_data_received] [traceID=16] [rdev1:b7083ba1-2b69-4752-bb66-73320369d03c] Data Channel: PushData received: time diff=1 ms.
test_raft_repl_dev_replica_1_log:[04/02/25 01:44:03-07:00] [D] [test_raft_repl_dev_replica_1] [239116] [raft_repl_dev.cpp:585:applier_create_req] [traceID=16] [rdev1:b7083ba1-2b69-4752-bb66-73320369d03c] in follower_create_req: rreq=repl_key=[server=1203793269, term=3, dsn=16, hash=1203793254], lsn=-1 state=[BLK_ALLOCATED | ] m_headersize=24 m_keysize=8 is_proposer=false local_blkid=[{blk#=212998 count=1 chunk=2},] remote_blkid=[], addr=0x6160000a5080
test_raft_repl_dev_replica_1_log:[04/02/25 01:44:03-07:00] [D] [test_raft_repl_dev_replica_1] [239096] [raft_repl_dev.cpp:529:operator()] [traceID=16] [rdev1:b7083ba1-2b69-4752-bb66-73320369d03c] Data Channel: Data write completed for rreq=[dsn=16 term=3 lsn=-1 op=HS_DATA_LINKED local_blkid=[{blk#=212998 count=1 chunk=2},] state=[BLK_ALLOCATED | DATA_RECEIVED | DATA_WRITTEN | ]], time_diff_data_log_us=0, data_write_latency_us=519, total_data_write_latency_us(rreq creation to write complete)=467, local_blkid.num_pieces=1
test_raft_repl_dev_replica_1_log:[04/02/25 01:44:03-07:00] [D] [test_raft_repl_dev_replica_1] [239115] [raft_repl_dev.cpp:600:notify_after_data_written] [traceID=16] [rdev1:b7083ba1-2b69-4752-bb66-73320369d03c] Data written and blkid mapped: rkey=[dsn=16 term=3 lsn=23 op=HS_DATA_LINKED local_blkid=[{blk#=212998 count=1 chunk=2},] state=[BLK_ALLOCATED | DATA_RECEIVED | DATA_WRITTEN | LOG_RECEIVED | ]]
test_raft_repl_dev_replica_1_log:[04/02/25 01:44:03-07:00] [D] [test_raft_repl_dev_replica_1] [239136] [raft_repl_dev.cpp:1000:handle_commit] [traceID=16] [rdev1:b7083ba1-2b69-4752-bb66-73320369d03c] Raft channel: Commit rreq=[dsn=16 term=3 lsn=23 op=HS_DATA_LINKED local_blkid=[{blk#=212998 count=1 chunk=2},] state=[BLK_ALLOCATED | DATA_RECEIVED | DATA_WRITTEN | LOG_RECEIVED | LOG_FLUSHED]]
test_raft_repl_dev_replica_2_log:[04/02/25 01:44:03-07:00] [D] [test_raft_repl_dev_replica_2] [239125] [raft_repl_dev.cpp:465:on_push_data_received] [traceID=16] [rdev1:b7083ba1-2b69-4752-bb66-73320369d03c] Data Channel: PushData received: time diff=2 ms.
test_raft_repl_dev_replica_2_log:[04/02/25 01:44:03-07:00] [D] [test_raft_repl_dev_replica_2] [239125] [raft_repl_dev.cpp:585:applier_create_req] [traceID=16] [rdev1:b7083ba1-2b69-4752-bb66-73320369d03c] in follower_create_req: rreq=repl_key=[server=1203793269, term=3, dsn=16, hash=1203793254], lsn=-1 state=[BLK_ALLOCATED | ] m_headersize=24 m_keysize=8 is_proposer=false local_blkid=[{blk#=163847 count=1 chunk=2},] remote_blkid=[], addr=0x616000045f80
test_raft_repl_dev_replica_2_log:[04/02/25 01:44:03-07:00] [D] [test_raft_repl_dev_replica_2] [239100] [raft_repl_dev.cpp:529:operator()] [traceID=16] [rdev1:b7083ba1-2b69-4752-bb66-73320369d03c] Data Channel: Data write completed for rreq=[dsn=16 term=3 lsn=-1 op=HS_DATA_LINKED local_blkid=[{blk#=163847 count=1 chunk=2},] state=[BLK_ALLOCATED | DATA_RECEIVED | DATA_WRITTEN | ]], time_diff_data_log_us=0, data_write_latency_us=397, total_data_write_latency_us(rreq creation to write complete)=358, local_blkid.num_pieces=1
test_raft_repl_dev_replica_2_log:[04/02/25 01:44:03-07:00] [D] [test_raft_repl_dev_replica_2] [239122] [raft_repl_dev.cpp:600:notify_after_data_written] [traceID=16] [rdev1:b7083ba1-2b69-4752-bb66-73320369d03c] Data written and blkid mapped: rkey=[dsn=16 term=3 lsn=23 op=HS_DATA_LINKED local_blkid=[{blk#=163847 count=1 chunk=2},] state=[BLK_ALLOCATED | DATA_RECEIVED | DATA_WRITTEN | LOG_RECEIVED | ]]
test_raft_repl_dev_replica_2_log:[04/02/25 01:44:03-07:00] [D] [test_raft_repl_dev_replica_2] [239139] [raft_repl_dev.cpp:1000:handle_commit] [traceID=16] [rdev1:b7083ba1-2b69-4752-bb66-73320369d03c] Raft channel: Commit rreq=[dsn=16 term=3 lsn=23 op=HS_DATA_LINKED local_blkid=[{blk#=163847 count=1 chunk=2},] state=[BLK_ALLOCATED | DATA_RECEIVED | DATA_WRITTEN | LOG_RECEIVED | LOG_FLUSHED]]

Signed-off-by: Xiaoxi Chen <xiaoxchen@ebay.com>
@yuwmao yuwmao merged commit 3c10d48 into eBay:master Apr 3, 2025
21 checks passed
xiaoxichen pushed a commit to eBay/HomeObject that referenced this pull request Apr 3, 2025
Relevant changes in HomeStore: eBay/HomeStore#679
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants