Skip to content

Create a KPI dashboard for quality numbers#448

Merged
castler merged 6 commits into
eclipse-score:mainfrom
ahmed0mousa:ahmo_add_nightly_quality_kpi
May 22, 2026
Merged

Create a KPI dashboard for quality numbers#448
castler merged 6 commits into
eclipse-score:mainfrom
ahmed0mousa:ahmo_add_nightly_quality_kpi

Conversation

@ahmed0mousa
Copy link
Copy Markdown
Contributor

@ahmed0mousa ahmed0mousa commented May 18, 2026

Add a nightly CI pipeline that runs three quality jobs in parallel (coverage, CodeQL, and clang-tidy) and publishes all results to GitHub Pages under latest/quality/. A Jinja2-based dashboard aggregates the findings into a single page with KPI trend tracking across runs. The Sphinx documentation is extended with a dedicated quality reports page and a version switcher navbar, and on every push to main the docs automatically pull the latest nightly KPI numbers so they stay current without waiting for another nightly run.

Issue: SWP-262453

@ahmed0mousa ahmed0mousa force-pushed the ahmo_add_nightly_quality_kpi branch from a6a2efd to 7380bb8 Compare May 18, 2026 14:20
Copy link
Copy Markdown
Contributor

@castler castler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not yet fully through

env:
ANDROID_HOME: ""
ANDROID_SDK_ROOT: ""
FORCE_JAVASCRIPT_ACTIONS_TO_NODE24: true
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not have things different in the release workflow then in others. So either, we add this everywhere or nowhere.

Can you please also state in the commit message why this change is needed?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Node 20 will be deprecated next month on GitHub Actions runners, I can add this to the commit message
https://github.blog/changelog/2025-09-19-deprecation-of-node-20-on-github-actions-runners/

os.system(
f"{code_ql_path} database analyze -j=0 {database_location} --format=sarifv2.1.0 --output={output_base}/codeql.sarif")
os.system(
f"{code_ql_path} database analyze -j=0 {database_location} --format=csv --output={output_base}/codeql.csv")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should keep the CSV output for direct human readibility

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reverted

Comment thread quality/static_analysis/codeql_lint.py Outdated
Comment on lines +98 to +106
- name: Set conclusion
id: set-conclusion
run: |
if [[ "${{ steps.run-coverage.outcome }}" == "success" ]]; then
echo "conclusion=success" >> $GITHUB_OUTPUT
else
echo "conclusion=failure" >> $GITHUB_OUTPUT
fi

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this needed, can we try to remove this again please?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the coverage step has continue-on-error: true meaning if bazel coverage fails, the job doesn't stop, it keeps running. Without the "Set conclusion" step, the caller nightly_quality.yml has no way to know whether coverage actually passed or failed; it only sees the job as success because continue-on-error suppresses the failure. So In continue-on-error hides the failure from GitHub's job status, "Set conclusion" exists to un-hide it for the dashboard. For a nightly quality pipeline, partial data is actually useful, if coverage fails at night, you want something to look at the next morning rather than an empty artifact.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, but then we have to also change this in the release workflow. Because there we expect to abort the release, if the coverage fails.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why? its still the same behaviour:

  1. run-coverage has continue-on-error: true -> job never fails, so set-conclusion always runs
  2. set-conclusion sets conclusion=failure when coverage failed
  3. upload-coverage-report gate step checks needs.run-coverage-report.outputs.conclusion != 'success' -> exits 1 -> job fails
  4. delete-release-on-failure triggers on failure() -> draft deleted

Comment thread .github/workflows/clang_tidy.yml Outdated
Comment thread .github/workflows/nightly_quality.yml Outdated
Comment thread .github/workflows/nightly_quality.yml Outdated
@ahmed0mousa ahmed0mousa force-pushed the ahmo_add_nightly_quality_kpi branch 4 times, most recently from bff61c4 to bb988f3 Compare May 19, 2026 14:36
Comment thread quality/dashboard/dashboard.html.j2
Comment thread quality/static_analysis/codeql_lint.py Outdated
Comment thread .github/workflows/docs.yml Outdated
Comment thread .github/workflows/docs.yml Outdated
@ahmed0mousa ahmed0mousa force-pushed the ahmo_add_nightly_quality_kpi branch 4 times, most recently from c55932b to 154647b Compare May 21, 2026 13:26
LCOV_DAT="${1:?Usage: $0 <lcov.dat> [output-dir]}"
OUTPUT_DIR="${2:-cpp_coverage}"

# NOTE: "--ignore-errors category,inconsistent"
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please share your thoughts here

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We keep it for now - but we should definitely try out the --local_test_jobs=1

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left for now

#
# Arguments:
# zip-dir Directory containing the downloaded coverage zip (default: /tmp/coverage_zip)
# output-dir Directory to copy the HTML report into (default: ${GITHUB_WORKSPACE}/_quality/coverage)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we maybe link it via the static part of the sphinx build? Meaning if we move it here:
docs/sphinx/_static/...
then sphinx should automatically be able to pick it up

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Short answer: not with Bazel's hermetic build model.

_static is evaluated by Bazel at analysis time via glob(["_static/**/*"]) in the BUILD file. glob() runs when Bazel reads the BUILD file before any execution. Coverage HTML is generated at CI runtime, so it doesn't exist when Bazel analyses the graph. Bazel would simply not see it and not include it in the sandbox.

Even if you copied the coverage files into _static/ before calling bazel build, the sandbox only sees declared inputs. Files copied in after glob() evaluated are invisible to it.

Copy link
Copy Markdown
Contributor

@castler castler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A very huge PR....I hope I figured out everyhting....

Comment thread .github/workflows/docs.yml Outdated
sudo apt-get install -y lcov

- name: Setup Bazel with shared caching
uses: bazel-contrib/setup-bazel@0.18.0
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should use the same setup-bazel as in the other workflows (can be done in a follow up PR).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to be done in a follow up PR

Comment thread .github/workflows/coverage_report.yml Outdated
Comment on lines +76 to +84
bash quality/scripts/generate_coverage_html.sh \
"$(bazel info output_path)/_coverage/_coverage_report.dat" \
cpp_coverage

- name: Create archive of test report
if: steps.run-coverage.outcome == 'success'
run: |
mkdir -p artifacts
find bazel-testlogs/score/ -name 'test.xml' -print0 | xargs -0 -I{} cp --parents {} artifacts/
cp -r cpp_coverage artifacts/
zip -r ${{ github.event.repository.name }}_coverage_report_${{ github.sha }}.zip artifacts/
shell: bash
bash quality/scripts/create_coverage_archive.sh \
"${{ github.event.repository.name }}_coverage_report_${{ github.sha }}"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should merge these into one script, and just have an cli option to create the archive. This will reduce the steps here in the CI.

Besides that, we should execute this via bazel run.

(can be done in follow up PR).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to be done in a follow up PR

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed in #472

Comment on lines +98 to +106
- name: Set conclusion
id: set-conclusion
run: |
if [[ "${{ steps.run-coverage.outcome }}" == "success" ]]; then
echo "conclusion=success" >> $GITHUB_OUTPUT
else
echo "conclusion=failure" >> $GITHUB_OUTPUT
fi

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, but then we have to also change this in the release workflow. Because there we expect to abort the release, if the coverage fails.

Comment thread .github/workflows/coverage_report.yml Outdated
Comment on lines 51 to 54
@@ -49,4 +54,4 @@
sudo apt-get install -y lcov
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

else
# Other triggers — restore from the latest successful nightly run so
# quality reports are preserved in every Pages deployment.
RUN_ID=$(gh api \
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok. this is good. I was concerned that we would lose the data in the next doc build otherwise.

Comment thread .github/workflows/deploy_docs.yml Outdated
Comment on lines +154 to +156
--shared-css docs/sphinx/_static/css/version_flyout.css \
--shared-js docs/sphinx/_static/js/version_flyout.js \
--root-index docs/sphinx/_gh_pages/index.html
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we not inject this via bazel data dependencies and thus make this here way easier?

Comment thread docs/sphinx/utils/BUILD Outdated
name = "assemble_publish_tree",
srcs = ["assemble_publish_tree.py"],
main = "assemble_publish_tree.py",
visibility = ["//visibility:public"],
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should not be publicly visible.

- Report
* - Coverage
- Line, function, and branch coverage from C++ unit tests (gcov/lcov)
- `Coverage report <quality/coverage/index.html>`_
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be a broken link on local builds, is it possible that we make here some kind of "if-statements" that we show another text in local builds? Or is this a tradeoff that we need. The problem is that we upload the docs also for each release and this means that this will be there broken also.

Copy link
Copy Markdown
Contributor Author

@ahmed0mousa ahmed0mousa May 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is genuinely a tradeoff. Three realistic options:

Option A : Accept it, strengthen the note (simplest)
Keep the relative link for the deployed site, update the note to explicitly mention it also doesn't work in versioned release docs - only on latest.

Option B : Absolute URL
Replace the relative link with https://eclipse-score.github.io/communication/latest/quality/coverage/index.html. Works from anywhere (local build, release archive, deployed site), but always points to latest regardless of which version of the docs you're reading, and hardcodes the repo URL.

Option C : .. only:: directive
Set a Sphinx tag (e.g. deployed) via conf.py reading an env var set in deploy_docs.yml. Show the relative link only when the tag is present; show a note otherwise. Cleanest, but requires wiring an env var through the build.

Given that quality reports are only ever published under latest/ (not under versioned tags), option B's "always points to latest" is actually correct behaviour. Option C adds build complexity for the same end result.

Which would you prefer? change could be done in a follow up

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find a simpler solution which is a binary flag with DOCS_VERSION + DOCS_BASE_URL so the template can handle all three cases (latest/, Versioned release, and local build) differently. applied in #472

Comment on lines -77 to -79
env:
DOCS_BASE_URL: "https://${{ github.repository_owner }}.github.io/${{ github.event.repository.name }}"
DOCS_VERSION: ${{ steps.version.outputs.version }}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see no reason why this is removed?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

- Restructure as workflow_call-only with artifact-name + conclusion outputs
- Extract genhtml invocation into quality/scripts/generate_coverage_html.sh
- Extract archive assembly into quality/scripts/create_coverage_archive.sh
- Extract artifact extraction into quality/scripts/extract_coverage_artifact.sh
- Fix cache-save: always save (event_name is workflow_call inside reusable workflows)
- generate_dashboard.py: reads LCOV .dat, renders HTML via Jinja2 template,
  tracks history in JSON, writes GitHub Actions step summary
- dashboard.html.j2: dark-themed coverage summary with per-file table,
  sparkline trend history, and progress bars
- BUILD: py_binary target with jinja2/markupsafe from score_communication_pip
- Run coverage via reusable coverage_report.yml on schedule (midnight UTC)
- deploy-quality-reports job: downloads coverage artifact, extracts HTML,
  runs bazel run //quality/dashboard:generate_dashboard to produce KPI page,
  uploads nightly-quality-reports artifact for deploy_docs.yml to consume
- bundle_quality_reports.sh: downloads artifact from nightly run (direct on
  workflow_run trigger, or latest successful run on other triggers)
- deploy_docs.yml: build Sphinx docs with Bazel, upload release tar.gz assets,
  restore old versions, assemble versioned publish/ tree, bundle nightly quality
  reports, deploy to GitHub Pages via actions/upload-pages-artifact +
  actions/deploy-pages; triggers on push/tag/PR/workflow_run (nightly)
- assemble_publish_tree.py: assembles publish/ directory — copies current build
  to publish/<version>/, promotes stable/, copies _shared/ CSS+JS assets,
  generates switcher.json, writes root index.html and .nojekyll
- BUILD: py_binary target for assemble_publish_tree (stdlib only, no deps)
- docs/sphinx/: add quality_reports.rst (coverage-only), wire into index.rst
  toctree, update conf.py theme options and remove unused GITHUB_PAGES_URL
- Replace apt-get lcov with @lcov_deb deb download; move generate_coverage_html.sh
  to quality/coverage/ and expose it as an sh_binary (quality/coverage/BUILD)
- Add rules_shell 0.6.1 and lcov_deb to MODULE.bazel as dev_dependencies
- Wire version_flyout CSS/JS as Bazel data deps in assemble_publish_tree instead
  of --shared-css/--shared-js CLI args; remove those args from deploy_docs.yml
- Add exports_files to docs/sphinx/BUILD for cross-package runfile visibility
- Guard upload-artifact and artifact-name output on coverage step success
- Add coverage-conclusion gate step to automated_release.yml upload job
- Fix cache-save to unconditional true in coverage_report.yml (workflow_call
  context never has event_name == 'schedule')
- Remove --history arg from generate_dashboard invocation in nightly_quality.yml
- Update generate_dashboard.py docstring to remove --history from usage example
- Switch quality/dashboard/BUILD deps to direct @score_communication_pip// labels
@ahmed0mousa ahmed0mousa force-pushed the ahmo_add_nightly_quality_kpi branch from 154647b to 0f1eb00 Compare May 22, 2026 10:52
@castler castler marked this pull request as ready for review May 22, 2026 11:45
@castler castler added this pull request to the merge queue May 22, 2026
Merged via the queue into eclipse-score:main with commit 571427d May 22, 2026
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants