Skip to content

🐛 Fix race condition in e2e code coverage collection#2644

Merged
openshift-merge-bot[bot] merged 2 commits intooperator-framework:mainfrom
pedjak:worktree-fix-e2e-coverage-race
Apr 13, 2026
Merged

🐛 Fix race condition in e2e code coverage collection#2644
openshift-merge-bot[bot] merged 2 commits intooperator-framework:mainfrom
pedjak:worktree-fix-e2e-coverage-race

Conversation

@pedjak
Copy link
Copy Markdown
Contributor

@pedjak pedjak commented Apr 13, 2026

Description

hack/test/e2e-coverage.sh has a race condition that causes intermittent
coverage data loss in CI. kubectl scale --replicas=0 is non-blocking —
it returns as soon as the API server accepts the change, not when pods
have terminated. The existing wait --for=condition=ready on the copy pod
was a no-op since it was already running. This meant kubectl cp could
execute before manager pods had terminated and flushed coverage data to
the PVC.

The fix replaces the no-op wait with kubectl wait --for=delete on the
manager pods, ensuring they have fully terminated and the Go coverage
runtime has written its data before copying.

Reviewer Checklist

  • API Go Documentation
  • Tests: Unit Tests (and E2E Tests, if appropriate)
  • Comprehensive Commit Messages
  • Links to related GitHub Issue(s)

kubectl scale --replicas=0 is non-blocking and returns as soon as
the API server accepts the change, not when pods have terminated.
The existing wait on the copy pod was a no-op since it was already
running. This meant kubectl cp could run before manager pods had
terminated and flushed coverage data to the PVC.

Wait for each deployment's .status.replicas to reach 0 before
copying, ensuring the Go coverage runtime has written its data.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Copilot AI review requested due to automatic review settings April 13, 2026 11:06
@netlify
Copy link
Copy Markdown

netlify bot commented Apr 13, 2026

Deploy Preview for olmv1 ready!

Name Link
🔨 Latest commit df9cdbe
🔍 Latest deploy log https://app.netlify.com/projects/olmv1/deploys/69dcd443a0e3240008074304
😎 Deploy Preview https://deploy-preview-2644--olmv1.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@openshift-ci openshift-ci bot requested review from bentito and joelanford April 13, 2026 11:06
@pedjak pedjak changed the title 🐛 Fix race condition in e2e coverage collection 🐛 Fix race condition in e2e code coverage collection Apr 13, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Fixes intermittent e2e coverage loss by ensuring manager deployments have fully scaled down (and thus flushed coverage data to the PVC) before kubectl cp runs.

Changes:

  • Replaces “wait for copy pod ready” with waits for both manager deployments’ replica counts to reach 0.
  • Adds explicit timeouts to the waits to avoid hanging indefinitely.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +26 to +27
kubectl -n "$OPERATOR_CONTROLLER_NAMESPACE" wait --for=jsonpath='{.status.replicas}'=0 deployment/"$OPERATOR_CONTROLLER_MANAGER_DEPLOYMENT_NAME" --timeout=60s
kubectl -n "$CATALOGD_NAMESPACE" wait --for=jsonpath='{.status.replicas}'=0 deployment/"$CATALOGD_MANAGER_DEPLOYMENT_NAME" --timeout=60s
Copy link

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DeploymentStatus.replicas is an optional (omitempty) field and may be absent when it is 0. In that case, kubectl wait --for=jsonpath='{.status.replicas}'=0 may never match (empty string != 0) and can time out intermittently. A more robust approach is to wait for the underlying pods to be deleted (e.g., kubectl wait --for=delete pod -l <selector> --timeout=...) or implement a small polling loop that treats empty .status.replicas as 0.

Copilot uses AI. Check for mistakes.
Comment on lines +26 to +27
kubectl -n "$OPERATOR_CONTROLLER_NAMESPACE" wait --for=jsonpath='{.status.replicas}'=0 deployment/"$OPERATOR_CONTROLLER_MANAGER_DEPLOYMENT_NAME" --timeout=60s
kubectl -n "$CATALOGD_NAMESPACE" wait --for=jsonpath='{.status.replicas}'=0 deployment/"$CATALOGD_MANAGER_DEPLOYMENT_NAME" --timeout=60s
Copy link

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hard-coding --timeout=60s risks CI flakes on slower clusters or during API server pressure (scale-down and termination can exceed 60s). Consider making the timeout configurable (env var with a sensible default) and/or increasing it to a more conservative value to reduce intermittent failures.

Copilot uses AI. Check for mistakes.
kubectl scale --replicas=0 is non-blocking and returns as soon as
the API server accepts the change, not when pods have terminated.
The existing wait on the copy pod was a no-op since it was already
running. This meant kubectl cp could run before manager pods had
terminated and flushed coverage data to the PVC.

Wait for manager pods to be deleted before copying, ensuring the
Go coverage runtime has written its data on process exit.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
@pedjak pedjak force-pushed the worktree-fix-e2e-coverage-race branch from 878686f to df9cdbe Compare April 13, 2026 11:32
@codecov
Copy link
Copy Markdown

codecov bot commented Apr 13, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 68.94%. Comparing base (dd57c28) to head (df9cdbe).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2644      +/-   ##
==========================================
+ Coverage   68.92%   68.94%   +0.02%     
==========================================
  Files         140      140              
  Lines        9905     9905              
==========================================
+ Hits         6827     6829       +2     
+ Misses       2566     2565       -1     
+ Partials      512      511       -1     
Flag Coverage Δ
e2e 37.53% <ø> (-0.28%) ⬇️
experimental-e2e 52.39% <ø> (ø)
unit 53.58% <ø> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Apr 13, 2026
@openshift-ci
Copy link
Copy Markdown

openshift-ci bot commented Apr 13, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: camilamacedo86

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 13, 2026
@openshift-merge-bot openshift-merge-bot bot merged commit c641e2f into operator-framework:main Apr 13, 2026
26 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants