Create a KPI dashboard for quality numbers#448
Conversation
a6a2efd to
7380bb8
Compare
| env: | ||
| ANDROID_HOME: "" | ||
| ANDROID_SDK_ROOT: "" | ||
| FORCE_JAVASCRIPT_ACTIONS_TO_NODE24: true |
There was a problem hiding this comment.
We should not have things different in the release workflow then in others. So either, we add this everywhere or nowhere.
Can you please also state in the commit message why this change is needed?
There was a problem hiding this comment.
Node 20 will be deprecated next month on GitHub Actions runners, I can add this to the commit message
https://github.blog/changelog/2025-09-19-deprecation-of-node-20-on-github-actions-runners/
| os.system( | ||
| f"{code_ql_path} database analyze -j=0 {database_location} --format=sarifv2.1.0 --output={output_base}/codeql.sarif") | ||
| os.system( | ||
| f"{code_ql_path} database analyze -j=0 {database_location} --format=csv --output={output_base}/codeql.csv") |
There was a problem hiding this comment.
We should keep the CSV output for direct human readibility
| - name: Set conclusion | ||
| id: set-conclusion | ||
| run: | | ||
| if [[ "${{ steps.run-coverage.outcome }}" == "success" ]]; then | ||
| echo "conclusion=success" >> $GITHUB_OUTPUT | ||
| else | ||
| echo "conclusion=failure" >> $GITHUB_OUTPUT | ||
| fi | ||
|
|
There was a problem hiding this comment.
Why is this needed, can we try to remove this again please?
There was a problem hiding this comment.
Because the coverage step has continue-on-error: true meaning if bazel coverage fails, the job doesn't stop, it keeps running. Without the "Set conclusion" step, the caller nightly_quality.yml has no way to know whether coverage actually passed or failed; it only sees the job as success because continue-on-error suppresses the failure. So In continue-on-error hides the failure from GitHub's job status, "Set conclusion" exists to un-hide it for the dashboard. For a nightly quality pipeline, partial data is actually useful, if coverage fails at night, you want something to look at the next morning rather than an empty artifact.
There was a problem hiding this comment.
Agreed, but then we have to also change this in the release workflow. Because there we expect to abort the release, if the coverage fails.
There was a problem hiding this comment.
why? its still the same behaviour:
- run-coverage has continue-on-error: true -> job never fails, so set-conclusion always runs
- set-conclusion sets conclusion=failure when coverage failed
- upload-coverage-report gate step checks needs.run-coverage-report.outputs.conclusion != 'success' -> exits 1 -> job fails
- delete-release-on-failure triggers on failure() -> draft deleted
bff61c4 to
bb988f3
Compare
c55932b to
154647b
Compare
| LCOV_DAT="${1:?Usage: $0 <lcov.dat> [output-dir]}" | ||
| OUTPUT_DIR="${2:-cpp_coverage}" | ||
|
|
||
| # NOTE: "--ignore-errors category,inconsistent" |
There was a problem hiding this comment.
please share your thoughts here
There was a problem hiding this comment.
We keep it for now - but we should definitely try out the --local_test_jobs=1
| # | ||
| # Arguments: | ||
| # zip-dir Directory containing the downloaded coverage zip (default: /tmp/coverage_zip) | ||
| # output-dir Directory to copy the HTML report into (default: ${GITHUB_WORKSPACE}/_quality/coverage) |
There was a problem hiding this comment.
Could we maybe link it via the static part of the sphinx build? Meaning if we move it here:
docs/sphinx/_static/...
then sphinx should automatically be able to pick it up
There was a problem hiding this comment.
Short answer: not with Bazel's hermetic build model.
_static is evaluated by Bazel at analysis time via glob(["_static/**/*"]) in the BUILD file. glob() runs when Bazel reads the BUILD file before any execution. Coverage HTML is generated at CI runtime, so it doesn't exist when Bazel analyses the graph. Bazel would simply not see it and not include it in the sandbox.
Even if you copied the coverage files into _static/ before calling bazel build, the sandbox only sees declared inputs. Files copied in after glob() evaluated are invisible to it.
castler
left a comment
There was a problem hiding this comment.
A very huge PR....I hope I figured out everyhting....
| sudo apt-get install -y lcov | ||
|
|
||
| - name: Setup Bazel with shared caching | ||
| uses: bazel-contrib/setup-bazel@0.18.0 |
There was a problem hiding this comment.
We should use the same setup-bazel as in the other workflows (can be done in a follow up PR).
There was a problem hiding this comment.
to be done in a follow up PR
| bash quality/scripts/generate_coverage_html.sh \ | ||
| "$(bazel info output_path)/_coverage/_coverage_report.dat" \ | ||
| cpp_coverage | ||
|
|
||
| - name: Create archive of test report | ||
| if: steps.run-coverage.outcome == 'success' | ||
| run: | | ||
| mkdir -p artifacts | ||
| find bazel-testlogs/score/ -name 'test.xml' -print0 | xargs -0 -I{} cp --parents {} artifacts/ | ||
| cp -r cpp_coverage artifacts/ | ||
| zip -r ${{ github.event.repository.name }}_coverage_report_${{ github.sha }}.zip artifacts/ | ||
| shell: bash | ||
| bash quality/scripts/create_coverage_archive.sh \ | ||
| "${{ github.event.repository.name }}_coverage_report_${{ github.sha }}" |
There was a problem hiding this comment.
We should merge these into one script, and just have an cli option to create the archive. This will reduce the steps here in the CI.
Besides that, we should execute this via bazel run.
(can be done in follow up PR).
There was a problem hiding this comment.
to be done in a follow up PR
| - name: Set conclusion | ||
| id: set-conclusion | ||
| run: | | ||
| if [[ "${{ steps.run-coverage.outcome }}" == "success" ]]; then | ||
| echo "conclusion=success" >> $GITHUB_OUTPUT | ||
| else | ||
| echo "conclusion=failure" >> $GITHUB_OUTPUT | ||
| fi | ||
|
|
There was a problem hiding this comment.
Agreed, but then we have to also change this in the release workflow. Because there we expect to abort the release, if the coverage fails.
| @@ -49,4 +54,4 @@ | |||
| sudo apt-get install -y lcov | |||
There was a problem hiding this comment.
We should remove this (in a follow up PR). As this is only needed to have genhtml in the system. The right way is to follow this approach:
| else | ||
| # Other triggers — restore from the latest successful nightly run so | ||
| # quality reports are preserved in every Pages deployment. | ||
| RUN_ID=$(gh api \ |
There was a problem hiding this comment.
ok. this is good. I was concerned that we would lose the data in the next doc build otherwise.
| --shared-css docs/sphinx/_static/css/version_flyout.css \ | ||
| --shared-js docs/sphinx/_static/js/version_flyout.js \ | ||
| --root-index docs/sphinx/_gh_pages/index.html |
There was a problem hiding this comment.
Can we not inject this via bazel data dependencies and thus make this here way easier?
| name = "assemble_publish_tree", | ||
| srcs = ["assemble_publish_tree.py"], | ||
| main = "assemble_publish_tree.py", | ||
| visibility = ["//visibility:public"], |
There was a problem hiding this comment.
This should not be publicly visible.
| - Report | ||
| * - Coverage | ||
| - Line, function, and branch coverage from C++ unit tests (gcov/lcov) | ||
| - `Coverage report <quality/coverage/index.html>`_ |
There was a problem hiding this comment.
This will be a broken link on local builds, is it possible that we make here some kind of "if-statements" that we show another text in local builds? Or is this a tradeoff that we need. The problem is that we upload the docs also for each release and this means that this will be there broken also.
There was a problem hiding this comment.
This is genuinely a tradeoff. Three realistic options:
Option A : Accept it, strengthen the note (simplest)
Keep the relative link for the deployed site, update the note to explicitly mention it also doesn't work in versioned release docs - only on latest.
Option B : Absolute URL
Replace the relative link with https://eclipse-score.github.io/communication/latest/quality/coverage/index.html. Works from anywhere (local build, release archive, deployed site), but always points to latest regardless of which version of the docs you're reading, and hardcodes the repo URL.
Option C : .. only:: directive
Set a Sphinx tag (e.g. deployed) via conf.py reading an env var set in deploy_docs.yml. Show the relative link only when the tag is present; show a note otherwise. Cleanest, but requires wiring an env var through the build.
Given that quality reports are only ever published under latest/ (not under versioned tags), option B's "always points to latest" is actually correct behaviour. Option C adds build complexity for the same end result.
Which would you prefer? change could be done in a follow up
There was a problem hiding this comment.
I find a simpler solution which is a binary flag with DOCS_VERSION + DOCS_BASE_URL so the template can handle all three cases (latest/, Versioned release, and local build) differently. applied in #472
| env: | ||
| DOCS_BASE_URL: "https://${{ github.repository_owner }}.github.io/${{ github.event.repository.name }}" | ||
| DOCS_VERSION: ${{ steps.version.outputs.version }} |
There was a problem hiding this comment.
I see no reason why this is removed?
Node20 will be deprecated
- Restructure as workflow_call-only with artifact-name + conclusion outputs - Extract genhtml invocation into quality/scripts/generate_coverage_html.sh - Extract archive assembly into quality/scripts/create_coverage_archive.sh - Extract artifact extraction into quality/scripts/extract_coverage_artifact.sh - Fix cache-save: always save (event_name is workflow_call inside reusable workflows)
- generate_dashboard.py: reads LCOV .dat, renders HTML via Jinja2 template, tracks history in JSON, writes GitHub Actions step summary - dashboard.html.j2: dark-themed coverage summary with per-file table, sparkline trend history, and progress bars - BUILD: py_binary target with jinja2/markupsafe from score_communication_pip
- Run coverage via reusable coverage_report.yml on schedule (midnight UTC) - deploy-quality-reports job: downloads coverage artifact, extracts HTML, runs bazel run //quality/dashboard:generate_dashboard to produce KPI page, uploads nightly-quality-reports artifact for deploy_docs.yml to consume - bundle_quality_reports.sh: downloads artifact from nightly run (direct on workflow_run trigger, or latest successful run on other triggers)
- deploy_docs.yml: build Sphinx docs with Bazel, upload release tar.gz assets, restore old versions, assemble versioned publish/ tree, bundle nightly quality reports, deploy to GitHub Pages via actions/upload-pages-artifact + actions/deploy-pages; triggers on push/tag/PR/workflow_run (nightly) - assemble_publish_tree.py: assembles publish/ directory — copies current build to publish/<version>/, promotes stable/, copies _shared/ CSS+JS assets, generates switcher.json, writes root index.html and .nojekyll - BUILD: py_binary target for assemble_publish_tree (stdlib only, no deps) - docs/sphinx/: add quality_reports.rst (coverage-only), wire into index.rst toctree, update conf.py theme options and remove unused GITHUB_PAGES_URL
- Replace apt-get lcov with @lcov_deb deb download; move generate_coverage_html.sh to quality/coverage/ and expose it as an sh_binary (quality/coverage/BUILD) - Add rules_shell 0.6.1 and lcov_deb to MODULE.bazel as dev_dependencies - Wire version_flyout CSS/JS as Bazel data deps in assemble_publish_tree instead of --shared-css/--shared-js CLI args; remove those args from deploy_docs.yml - Add exports_files to docs/sphinx/BUILD for cross-package runfile visibility - Guard upload-artifact and artifact-name output on coverage step success - Add coverage-conclusion gate step to automated_release.yml upload job - Fix cache-save to unconditional true in coverage_report.yml (workflow_call context never has event_name == 'schedule') - Remove --history arg from generate_dashboard invocation in nightly_quality.yml - Update generate_dashboard.py docstring to remove --history from usage example - Switch quality/dashboard/BUILD deps to direct @score_communication_pip// labels
154647b to
0f1eb00
Compare
Add a nightly CI pipeline that runs three quality jobs in parallel (coverage, CodeQL, and clang-tidy) and publishes all results to GitHub Pages under
latest/quality/. A Jinja2-based dashboard aggregates the findings into a single page with KPI trend tracking across runs. The Sphinx documentation is extended with a dedicated quality reports page and a version switcher navbar, and on every push to main the docs automatically pull the latest nightly KPI numbers so they stay current without waiting for another nightly run.Issue: SWP-262453