Skip to content

🐛 ensure mongodb secondaries always get votes:1 and priority:1#2449

Open
DarkIsDude wants to merge 2 commits into
development/2.15from
bugfix/ZENKO-5302/mongodb-secondary-voting-rights
Open

🐛 ensure mongodb secondaries always get votes:1 and priority:1#2449
DarkIsDude wants to merge 2 commits into
development/2.15from
bugfix/ZENKO-5302/mongodb-secondary-voting-rights

Conversation

@DarkIsDude

@DarkIsDude DarkIsDude commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

What does this PR do, and why do we need it?

It fixes a bug in the MongoDB sharded image bootstrap (libmongodb.sh) that can leave every replica set with only one voting member, turning that single node into a hard single point of failure: if it goes down, the shard (and therefore the datastore) goes read‑only and Zenko is down.


A 30‑second MongoDB primer (for non‑Mongo readers)

Our data lives in replica sets: groups of MongoDB nodes (here, 3 per shard and 3 for the config server) that each hold a copy of the data.

  • One node is the PRIMARY (takes writes); the others are SECONDARIES (copies).
  • To stay alive, a replica set must keep a majority of votes online and be able to elect a PRIMARY. Two settings per member matter:
    • votes (0 or 1): can this member vote in an election? Majority is counted over the sum of votes.
    • priority (0+): can this member be elected PRIMARY? priority: 0 means "never become PRIMARY".
  • A healthy 3‑node set has 3 members with votes: 1, priority: 1. It survives losing any one node: the remaining 2 votes are still a majority, and a surviving member can be elected PRIMARY.

A member that is only partially configured — votes: 1, priority: 0, or worse votes: 0, priority: 0 — still holds data but cannot help keep the set alive.


The bug

When a SECONDARY first joins, the bootstrap script does this (mongodb_configure_secondary):

  1. rs.add(... votes: 0, priority: 0) — add the node without voting power so it doesn't disturb the existing majority while it copies data (this is MongoDB's recommended safe procedure).
  2. Confirm the node is now listed in the replica set (mongodb_node_currently_in_cluster).
  3. Wait for it to finish its initial sync (reach SECONDARY state).
  4. Grant voting rights: reconfigure it to votes: 1, priority: 1.

The confirmation in step 2 reads rs.status() and greps the output for the node:

result=$(mongodb_execute ... <<< "rs.status().members")
grep -q "'$node:$port'" <<<"$result"

The problem: with the broken image, mongodb_execute is a thin wrapper around debug_execute, which throws away stdout unless BITNAMI_DEBUG=true (it isn't, by default):

debug_execute() { if is_boolean_yes "${BITNAMI_DEBUG:-false}"; then "$@"; else "$@" >/dev/null 2>&1; fi; }

So result is always empty, the grep always fails, and mongodb_node_currently_in_cluster always returns false. That makes step 2 (mongodb_wait_confirmation) time out and the bootstrap aborts with:

ERROR ==> Unable to confirm that <node> has been added to the replica set!

The container exits, Kubernetes restarts the pod, and on the second boot the data directory already exists, so the bootstrap takes the "deploy with persisted data" path and skips replica set configuration entirely. The node is left frozen at votes: 0, priority: 0. Steps 3–4 never run, so the SECONDARY is never promoted.

Net result: only the bootstrap PRIMARY (*-0) ends up with a vote → 1 voter per replica set → single point of failure.


Root cause: an incomplete fork of the Bitnami image

Bitnami stopped publishing the mongodb-sharded image, so we vendored its scripts into the repo (ZENKO-5110, #2366). The vendoring happened in two commits, and they are not equal:

Commit Branch libmongodb.sh mongodb_execute() defs Result
0ae8c8a3 development/2.14 1712 lines 2 ✅ works (→ 2.14.5)
2f6c7e42 development/2.15 1669 lines 1 ❌ broken (→ 2.15.1)

Compare: the only relevant difference is the last 43 lines of libmongodb.sh.

Upstream's libmongodb.sh is assembled by concatenating script fragments, and it deliberately defines mongodb_execute twice:

  1. Early in the file — the legacy wrapper that discards stdout (debug_execute mongodb_execute_print_output "$@").
  2. As a final appended fragment (it carries its own # Copyright … header and # shellcheck disable=SC2148 — the tell‑tale "no shebang" marker of a separate concatenated file) — the real version that calls mongosh directly and returns output.

In bash the last definition wins, so upstream's effective mongodb_execute is #2 (returns output) — which is exactly what mongodb_node_currently_in_cluster needs.

The 2.15 re‑vendoring (2f6c7e42) truncated the file at 1669 lines and dropped that final fragment. Only the output‑discarding wrapper was left, silently reverting mongodb_execute and breaking the confirmation check. The 2.14 vendoring had copied the whole file, so 2.14.5 worked. We didn't add a fix in 2.14 — we just vendored completely there, and lost it in 2.15.

It was easy to miss because dropping a duplicate function definition leaves valid bash that runs fine; the only signal was the line count (1712 vs 1669), and the failure only surfaces as a silent bootstrap race that is invisible until a node dies.

How we confirmed it (two clusters, same Mongo version)

A cluster on image base 2.14.5 was healthy (3 voters); a cluster on 2.15.1 was broken (1 voter). We confirmed the chain end‑to‑end from the live rs.conf() (broken cluster: secondaries at votes: 0, priority: 0; healthy cluster: votes: 1) and the pod boot logs (broken cluster fails at "Unable to confirm…"; healthy one gets past it). The diff between the two images' libmongodb.sh was exactly the 43‑line fragment above.


The fix

Two commits:

  1. 🐛 add missing libmongo.sh fork — restores the dropped 43‑line fragment, so the file matches upstream again. mongodb_execute is once more the output‑returning definition, mongodb_node_currently_in_cluster can read rs.status(), and the bootstrap no longer aborts before granting voting rights. This is the root‑cause fix (faithful re‑sync with upstream).

  2. 🐛 ensure secondary keeps votes and priority after restart — makes the voting‑rights grant idempotent and re‑runnable, so a node can no longer be left stranded without votes:

    • new helper mongodb_secondary_node_has_voting_rights checks whether the member already has votes > 0 && priority > 0;
    • mongodb_configure_secondary now grants voting rights whenever they are missing — even if the node is already in the cluster, instead of only on the freshly‑added path. So if a previous attempt added the node at votes/priority 0 and then failed (or was restarted) before promotion, the next run finishes the job and converges it to votes: 1, priority: 1;
    • the misleading error message ("did not get marked as secondary" printed for the voting step) is corrected to "did not get granted voting rights".

After this change a fresh deployment reliably ends with all members at votes: 1, priority: 1 (true HA), and a member that is already in the replica set but under‑privileged is repaired rather than silently left non‑voting.


Which issue does this PR fix?

Fixes ZENKO-5302.

Special notes for your reviewers:

  • Already‑broken clusters won't fully self‑heal from the image alone. On a pod restart the bootstrap takes the persisted‑data path, which skips replica‑set (re)configuration entirely, so an existing cluster already stranded at votes: 0 still needs a one‑time manual rs.reconfig() on each replica set (shard-N and configsvr) to set members 1 and 2 to votes: 1, priority: 1. The script change guarantees correct behaviour for new bootstraps and for any path where mongodb_configure_secondary runs against a member that lacks voting rights.
  • Should we keep the second fix ? We can use a dedicated PR for that as we already have this issue on older cluster (4.2.3 have it).

@bert-e

bert-e commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

Hello darkisdude,

My role is to assist you with the merge of this
pull request. Please type @bert-e help to get information
on this process, or consult the user documentation.

Available options
name description privileged authored
/after_pull_request Wait for the given pull request id to be merged before continuing with the current one.
/bypass_author_approval Bypass the pull request author's approval
/bypass_build_status Bypass the build and test status
/bypass_commit_size Bypass the check on the size of the changeset TBA
/bypass_incompatible_branch Bypass the check on the source branch prefix
/bypass_jira_check Bypass the Jira issue check
/bypass_peer_approval Bypass the pull request peers' approval
/bypass_leader_approval Bypass the pull request leaders' approval
/approve Instruct Bert-E that the author has approved the pull request. ✍️
/create_pull_requests Allow the creation of integration pull requests.
/create_integration_branches Allow the creation of integration branches.
/no_octopus Prevent Wall-E from doing any octopus merge and use multiple consecutive merge instead
/unanimity Change review acceptance criteria from one reviewer at least to all reviewers
/wait Instruct Bert-E not to run until further notice.
Available commands
name description privileged
/help Print Bert-E's manual in the pull request.
/status Print Bert-E's current status in the pull request TBA
/clear Remove all comments from Bert-E from the history TBA
/retry Re-start a fresh build TBA
/build Re-start a fresh build TBA
/force_reset Delete integration branches & pull requests, and restart merge process from the beginning.
/reset Try to remove integration branches unless there are commits on them which do not appear on the source branch.

Status report is not available.

@bert-e

bert-e commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

Incorrect fix version

The Fix Version/s in issue ZENKO-5302 contains:

  • None

Considering where you are trying to merge, I ignored possible hotfix versions and I expected to find:

  • 2.15.2

Please check the Fix Version/s of ZENKO-5302, or the target
branch of this pull request.

@DarkIsDude DarkIsDude self-assigned this Jun 26, 2026
@DarkIsDude DarkIsDude force-pushed the bugfix/ZENKO-5302/mongodb-secondary-voting-rights branch from 026ff16 to 528728f Compare June 26, 2026 09:30
@DarkIsDude DarkIsDude requested review from a team, benzekrimaha and maeldonn June 26, 2026 09:52
@DarkIsDude DarkIsDude marked this pull request as ready for review June 26, 2026 09:53

@delthas delthas left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, very clear writeup.

@bert-e

bert-e commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

Waiting for approval

The following approvals are needed before I can proceed with the merge:

  • the author

  • 2 peers

@francoisferrand francoisferrand left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "fix partial import" LGTM : this is upstream, this is how it was already running → not much risk there, and needs to be merged ASAP.

On the "secondary node voting rights" fix, I did not fully review but I am however much less confident: it is not upstream (hence not "proven" by experience), changes the whole flow of secondary process, and not tested... Also we did not see it in the field AFAIK, so no pressure : thus I would rather we move that improvement/fix to a separate PR, ideally upstreaming the change first, so we get an acknowledgment from bitnami, and can take the time to fully test this and ensure it won't create issue before we ship it.

@bert-e

bert-e commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

Waiting for approval

The following approvals are needed before I can proceed with the merge:

  • the author

  • 2 peers

The following reviewers are expecting changes from the author, or must review again:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants