🐛 ensure mongodb secondaries always get votes:1 and priority:1#2449
🐛 ensure mongodb secondaries always get votes:1 and priority:1#2449DarkIsDude wants to merge 2 commits into
Conversation
Hello darkisdude,My role is to assist you with the merge of this Available options
Available commands
Status report is not available. |
Incorrect fix versionThe
Considering where you are trying to merge, I ignored possible hotfix versions and I expected to find:
Please check the |
026ff16 to
528728f
Compare
delthas
left a comment
There was a problem hiding this comment.
LGTM, very clear writeup.
Waiting for approvalThe following approvals are needed before I can proceed with the merge:
|
There was a problem hiding this comment.
The "fix partial import" LGTM : this is upstream, this is how it was already running → not much risk there, and needs to be merged ASAP.
On the "secondary node voting rights" fix, I did not fully review but I am however much less confident: it is not upstream (hence not "proven" by experience), changes the whole flow of secondary process, and not tested... Also we did not see it in the field AFAIK, so no pressure : thus I would rather we move that improvement/fix to a separate PR, ideally upstreaming the change first, so we get an acknowledgment from bitnami, and can take the time to fully test this and ensure it won't create issue before we ship it.
Waiting for approvalThe following approvals are needed before I can proceed with the merge:
The following reviewers are expecting changes from the author, or must review again: |
What does this PR do, and why do we need it?
It fixes a bug in the MongoDB sharded image bootstrap (
libmongodb.sh) that can leave every replica set with only one voting member, turning that single node into a hard single point of failure: if it goes down, the shard (and therefore the datastore) goes read‑only and Zenko is down.A 30‑second MongoDB primer (for non‑Mongo readers)
Our data lives in replica sets: groups of MongoDB nodes (here, 3 per shard and 3 for the config server) that each hold a copy of the data.
votes(0 or 1): can this member vote in an election? Majority is counted over the sum of votes.priority(0+): can this member be elected PRIMARY?priority: 0means "never become PRIMARY".votes: 1, priority: 1. It survives losing any one node: the remaining 2 votes are still a majority, and a surviving member can be elected PRIMARY.A member that is only partially configured —
votes: 1, priority: 0, or worsevotes: 0, priority: 0— still holds data but cannot help keep the set alive.The bug
When a SECONDARY first joins, the bootstrap script does this (
mongodb_configure_secondary):rs.add(... votes: 0, priority: 0)— add the node without voting power so it doesn't disturb the existing majority while it copies data (this is MongoDB's recommended safe procedure).mongodb_node_currently_in_cluster).SECONDARYstate).votes: 1, priority: 1.The confirmation in step 2 reads
rs.status()and greps the output for the node:The problem: with the broken image,
mongodb_executeis a thin wrapper arounddebug_execute, which throws away stdout unlessBITNAMI_DEBUG=true(it isn't, by default):So
resultis always empty, the grep always fails, andmongodb_node_currently_in_clusteralways returns false. That makes step 2 (mongodb_wait_confirmation) time out and the bootstrap aborts with:The container exits, Kubernetes restarts the pod, and on the second boot the data directory already exists, so the bootstrap takes the "deploy with persisted data" path and skips replica set configuration entirely. The node is left frozen at
votes: 0, priority: 0. Steps 3–4 never run, so the SECONDARY is never promoted.Net result: only the bootstrap PRIMARY (
*-0) ends up with a vote → 1 voter per replica set → single point of failure.Root cause: an incomplete fork of the Bitnami image
Bitnami stopped publishing the
mongodb-shardedimage, so we vendored its scripts into the repo (ZENKO-5110, #2366). The vendoring happened in two commits, and they are not equal:libmongodb.shmongodb_execute()defs0ae8c8a32f6c7e42Compare: the only relevant difference is the last 43 lines of
libmongodb.sh.Upstream's
libmongodb.shis assembled by concatenating script fragments, and it deliberately definesmongodb_executetwice:debug_execute mongodb_execute_print_output "$@").# Copyright …header and# shellcheck disable=SC2148— the tell‑tale "no shebang" marker of a separate concatenated file) — the real version that callsmongoshdirectly and returns output.In bash the last definition wins, so upstream's effective
mongodb_executeis #2 (returns output) — which is exactly whatmongodb_node_currently_in_clusterneeds.The 2.15 re‑vendoring (
2f6c7e42) truncated the file at 1669 lines and dropped that final fragment. Only the output‑discarding wrapper was left, silently revertingmongodb_executeand breaking the confirmation check. The 2.14 vendoring had copied the whole file, so 2.14.5 worked. We didn't add a fix in 2.14 — we just vendored completely there, and lost it in 2.15.It was easy to miss because dropping a duplicate function definition leaves valid bash that runs fine; the only signal was the line count (1712 vs 1669), and the failure only surfaces as a silent bootstrap race that is invisible until a node dies.
How we confirmed it (two clusters, same Mongo version)
A cluster on image base
2.14.5was healthy (3 voters); a cluster on2.15.1was broken (1 voter). We confirmed the chain end‑to‑end from the livers.conf()(broken cluster: secondaries atvotes: 0, priority: 0; healthy cluster:votes: 1) and the pod boot logs (broken cluster fails at "Unable to confirm…"; healthy one gets past it). The diff between the two images'libmongodb.shwas exactly the 43‑line fragment above.The fix
Two commits:
🐛 add missing libmongo.sh fork— restores the dropped 43‑line fragment, so the file matches upstream again.mongodb_executeis once more the output‑returning definition,mongodb_node_currently_in_clustercan readrs.status(), and the bootstrap no longer aborts before granting voting rights. This is the root‑cause fix (faithful re‑sync with upstream).🐛 ensure secondary keeps votes and priority after restart— makes the voting‑rights grant idempotent and re‑runnable, so a node can no longer be left stranded without votes:mongodb_secondary_node_has_voting_rightschecks whether the member already hasvotes > 0 && priority > 0;mongodb_configure_secondarynow grants voting rights whenever they are missing — even if the node is already in the cluster, instead of only on the freshly‑added path. So if a previous attempt added the node atvotes/priority 0and then failed (or was restarted) before promotion, the next run finishes the job and converges it tovotes: 1, priority: 1;"did not get marked as secondary"printed for the voting step) is corrected to"did not get granted voting rights".After this change a fresh deployment reliably ends with all members at
votes: 1, priority: 1(true HA), and a member that is already in the replica set but under‑privileged is repaired rather than silently left non‑voting.Which issue does this PR fix?
Fixes ZENKO-5302.
Special notes for your reviewers:
votes: 0still needs a one‑time manualrs.reconfig()on each replica set (shard-Nandconfigsvr) to set members 1 and 2 tovotes: 1, priority: 1. The script change guarantees correct behaviour for new bootstraps and for any path wheremongodb_configure_secondaryruns against a member that lacks voting rights.