SDSTOR-21438: Adjust commit_quorum reduction ordering#875
Conversation
|
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #875 +/- ##
==========================================
- Coverage 56.51% 48.22% -8.29%
==========================================
Files 108 110 +2
Lines 10300 12900 +2600
Branches 1402 6204 +4802
==========================================
+ Hits 5821 6221 +400
+ Misses 3894 2566 -1328
- Partials 585 4113 +3528 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
|
|
||
| if (m_my_repl_id != get_leader_id()) { return make_async_error<>(ReplServiceError::NOT_LEADER); } | ||
| // Check if leader itself is requested to move out. | ||
| if (m_my_repl_id == member_out.id) { |
There was a problem hiding this comment.
Is that better to move if (m_my_repl_id == member_out.id) check at the very beginning?
bool is_leader = m_my_repl_id == get_leader_id());
// I am not the one to be removed
if (m_my_repl_id == member_out.id) {
if (is_leader). yield_leadership();
return make_async_error<>(ReplServiceError::NOT_LEADER);
}
// not leader handling
if (!is_leader) {
if (commit_quorum >=1) {
// reduce quorum size ....
reset_quorum_size(commit_quorum, trace_id);
}
return make_async_error<>(ReplServiceError::NOT_LEADER);
}
// Now I am the leader
When two members are down, the raft leader yields leadership after leadership_expiry (20x heartbeat). reset_quorum_size must be called before the NOT_LEADER check so the node can maintain/reclaim leadership with election_quorum=1. Previously, the TwoMemberDown UT called replace_member immediately, so the leadership doesn't expired. Sleep 10s to simulate leader expiry.
When two members are down, the raft leader yields leadership after leadership_expiry (20x heartbeat). reset_quorum_size must be called before the NOT_LEADER check so the node can maintain/reclaim leadership with election_quorum=1.
Previously, the TwoMemberDown UT called replace_member immediately, so the leadership doesn't expired. Sleep 10s to simulate leader expiry.
One thing to mention is that if a request fails after NOT_LEADER check, need retry twice, because it will reset commit_quorum to 0 before return.