Create a KPI dashboard for quality numbers by ahmed0mousa · Pull Request #448 · eclipse-score/communication

ahmed0mousa · 2026-05-18T14:18:09Z

Add a nightly CI pipeline that runs three quality jobs in parallel (coverage, CodeQL, and clang-tidy) and publishes all results to GitHub Pages under latest/quality/. A Jinja2-based dashboard aggregates the findings into a single page with KPI trend tracking across runs. The Sphinx documentation is extended with a dedicated quality reports page and a version switcher navbar, and on every push to main the docs automatically pull the latest nightly KPI numbers so they stay current without waiting for another nightly run.

Issue: SWP-262453

castler

Not yet fully through

castler · 2026-05-19T06:01:14Z

 env:
  ANDROID_HOME: ""
  ANDROID_SDK_ROOT: ""
+  FORCE_JAVASCRIPT_ACTIONS_TO_NODE24: true


We should not have things different in the release workflow then in others. So either, we add this everywhere or nowhere.

Can you please also state in the commit message why this change is needed?

Node 20 will be deprecated next month on GitHub Actions runners, I can add this to the commit message
https://github.blog/changelog/2025-09-19-deprecation-of-node-20-on-github-actions-runners/

castler · 2026-05-19T06:03:23Z

-            os.system(
-                f"{code_ql_path} database analyze -j=0 {database_location} --format=sarifv2.1.0 --output={output_base}/codeql.sarif")
-            os.system(
-                f"{code_ql_path} database analyze -j=0 {database_location} --format=csv --output={output_base}/codeql.csv")


We should keep the CSV output for direct human readibility

castler · 2026-05-19T06:08:52Z

+      - name: Set conclusion
+        id: set-conclusion
+        run: |
+          if [[ "${{ steps.run-coverage.outcome }}" == "success" ]]; then
+            echo "conclusion=success" >> $GITHUB_OUTPUT
+          else
+            echo "conclusion=failure" >> $GITHUB_OUTPUT
+          fi
+


Why is this needed, can we try to remove this again please?

Because the coverage step has continue-on-error: true meaning if bazel coverage fails, the job doesn't stop, it keeps running. Without the "Set conclusion" step, the caller nightly_quality.yml has no way to know whether coverage actually passed or failed; it only sees the job as success because continue-on-error suppresses the failure. So In continue-on-error hides the failure from GitHub's job status, "Set conclusion" exists to un-hide it for the dashboard. For a nightly quality pipeline, partial data is actually useful, if coverage fails at night, you want something to look at the next morning rather than an empty artifact.

Agreed, but then we have to also change this in the release workflow. Because there we expect to abort the release, if the coverage fails.

why? its still the same behaviour:

run-coverage has continue-on-error: true -> job never fails, so set-conclusion always runs

set-conclusion sets conclusion=failure when coverage failed

upload-coverage-report gate step checks needs.run-coverage-report.outputs.conclusion != 'success' -> exits 1 -> job fails

delete-release-on-failure triggers on failure() -> draft deleted

ahmed0mousa · 2026-05-21T13:32:29Z

+LCOV_DAT="${1:?Usage: $0 <lcov.dat> [output-dir]}"
+OUTPUT_DIR="${2:-cpp_coverage}"
+
+# NOTE: "--ignore-errors category,inconsistent"


please share your thoughts here

We keep it for now - but we should definitely try out the --local_test_jobs=1

I left for now

hoe-jo · 2026-05-21T15:21:35Z

+#
+# Arguments:
+#   zip-dir     Directory containing the downloaded coverage zip (default: /tmp/coverage_zip)
+#   output-dir  Directory to copy the HTML report into (default: ${GITHUB_WORKSPACE}/_quality/coverage)


Could we maybe link it via the static part of the sphinx build? Meaning if we move it here:
docs/sphinx/_static/...
then sphinx should automatically be able to pick it up

Short answer: not with Bazel's hermetic build model.

_static is evaluated by Bazel at analysis time via glob(["_static/**/*"]) in the BUILD file. glob() runs when Bazel reads the BUILD file before any execution. Coverage HTML is generated at CI runtime, so it doesn't exist when Bazel analyses the graph. Bazel would simply not see it and not include it in the sandbox.

Even if you copied the coverage files into _static/ before calling bazel build, the sandbox only sees declared inputs. Files copied in after glob() evaluated are invisible to it.

castler

A very huge PR....I hope I figured out everyhting....

castler · 2026-05-21T15:32:32Z

          sudo apt-get install -y lcov

      - name: Setup Bazel with shared caching
        uses: bazel-contrib/setup-bazel@0.18.0


We should use the same setup-bazel as in the other workflows (can be done in a follow up PR).

to be done in a follow up PR

castler · 2026-05-21T15:34:35Z

+          bash quality/scripts/generate_coverage_html.sh \
+            "$(bazel info output_path)/_coverage/_coverage_report.dat" \
+            cpp_coverage

      - name: Create archive of test report
+        if: steps.run-coverage.outcome == 'success'
        run: |
-          mkdir -p artifacts
-          find bazel-testlogs/score/ -name 'test.xml' -print0 | xargs -0 -I{} cp --parents {} artifacts/
-          cp -r cpp_coverage artifacts/
-          zip -r ${{ github.event.repository.name }}_coverage_report_${{ github.sha }}.zip artifacts/
-        shell: bash
+          bash quality/scripts/create_coverage_archive.sh \
+            "${{ github.event.repository.name }}_coverage_report_${{ github.sha }}"


We should merge these into one script, and just have an cli option to create the archive. This will reduce the steps here in the CI.

Besides that, we should execute this via bazel run.

(can be done in follow up PR).

to be done in a follow up PR

fixed in #472

castler · 2026-05-21T15:37:43Z

+      - name: Set conclusion
+        id: set-conclusion
+        run: |
+          if [[ "${{ steps.run-coverage.outcome }}" == "success" ]]; then
+            echo "conclusion=success" >> $GITHUB_OUTPUT
+          else
+            echo "conclusion=failure" >> $GITHUB_OUTPUT
+          fi
+


Agreed, but then we have to also change this in the release workflow. Because there we expect to abort the release, if the coverage fails.

castler · 2026-05-21T15:44:13Z

@@ -49,4 +54,4 @@
          sudo apt-get install -y lcov


We should remove this (in a follow up PR). As this is only needed to have genhtml in the system. The right way is to follow this approach:

https://github.com/eclipse-score/tooling/pull/238/changes#diff-4f29656c7bf17acdd9401fd9a35a68d19a163b1b1e146b8251003264b21ae5c5R43

https://github.com/eclipse-score/tooling/pull/238/changes#diff-6136fc12446089c3db7360e923203dd114b6a1466252e71667c6791c20fe6bdcR221

castler · 2026-05-22T06:08:31Z

+else
+    # Other triggers — restore from the latest successful nightly run so
+    # quality reports are preserved in every Pages deployment.
+    RUN_ID=$(gh api \


ok. this is good. I was concerned that we would lose the data in the next doc build otherwise.

castler · 2026-05-22T06:11:15Z

+            --shared-css  docs/sphinx/_static/css/version_flyout.css \
+            --shared-js   docs/sphinx/_static/js/version_flyout.js \
+            --root-index  docs/sphinx/_gh_pages/index.html


Can we not inject this via bazel data dependencies and thus make this here way easier?

castler · 2026-05-22T06:12:21Z

+    name = "assemble_publish_tree",
+    srcs = ["assemble_publish_tree.py"],
+    main = "assemble_publish_tree.py",
+    visibility = ["//visibility:public"],


This should not be publicly visible.

castler · 2026-05-22T06:14:34Z

+     - Report
+   * - Coverage
+     - Line, function, and branch coverage from C++ unit tests (gcov/lcov)
+     - `Coverage report <quality/coverage/index.html>`_


This will be a broken link on local builds, is it possible that we make here some kind of "if-statements" that we show another text in local builds? Or is this a tradeoff that we need. The problem is that we upload the docs also for each release and this means that this will be there broken also.

This is genuinely a tradeoff. Three realistic options:

Option A : Accept it, strengthen the note (simplest)
Keep the relative link for the deployed site, update the note to explicitly mention it also doesn't work in versioned release docs - only on latest.

Option B : Absolute URL
Replace the relative link with https://eclipse-score.github.io/communication/latest/quality/coverage/index.html. Works from anywhere (local build, release archive, deployed site), but always points to latest regardless of which version of the docs you're reading, and hardcodes the repo URL.

Option C : .. only:: directive
Set a Sphinx tag (e.g. deployed) via conf.py reading an env var set in deploy_docs.yml. Show the relative link only when the tag is present; show a note otherwise. Cleanest, but requires wiring an env var through the build.

Given that quality reports are only ever published under latest/ (not under versioned tags), option B's "always points to latest" is actually correct behaviour. Option C adds build complexity for the same end result.

Which would you prefer? change could be done in a follow up

I find a simpler solution which is a binary flag with DOCS_VERSION + DOCS_BASE_URL so the template can handle all three cases (latest/, Versioned release, and local build) differently. applied in #472

castler · 2026-05-22T06:16:03Z

-        env:
-          DOCS_BASE_URL: "https://${{ github.repository_owner }}.github.io/${{ github.event.repository.name }}"
-          DOCS_VERSION: ${{ steps.version.outputs.version }}


I see no reason why this is removed?

same #448 (comment)

Node20 will be deprecated

- Restructure as workflow_call-only with artifact-name + conclusion outputs - Extract genhtml invocation into quality/scripts/generate_coverage_html.sh - Extract archive assembly into quality/scripts/create_coverage_archive.sh - Extract artifact extraction into quality/scripts/extract_coverage_artifact.sh - Fix cache-save: always save (event_name is workflow_call inside reusable workflows)

- generate_dashboard.py: reads LCOV .dat, renders HTML via Jinja2 template, tracks history in JSON, writes GitHub Actions step summary - dashboard.html.j2: dark-themed coverage summary with per-file table, sparkline trend history, and progress bars - BUILD: py_binary target with jinja2/markupsafe from score_communication_pip

- Run coverage via reusable coverage_report.yml on schedule (midnight UTC) - deploy-quality-reports job: downloads coverage artifact, extracts HTML, runs bazel run //quality/dashboard:generate_dashboard to produce KPI page, uploads nightly-quality-reports artifact for deploy_docs.yml to consume - bundle_quality_reports.sh: downloads artifact from nightly run (direct on workflow_run trigger, or latest successful run on other triggers)

- deploy_docs.yml: build Sphinx docs with Bazel, upload release tar.gz assets, restore old versions, assemble versioned publish/ tree, bundle nightly quality reports, deploy to GitHub Pages via actions/upload-pages-artifact + actions/deploy-pages; triggers on push/tag/PR/workflow_run (nightly) - assemble_publish_tree.py: assembles publish/ directory — copies current build to publish/<version>/, promotes stable/, copies _shared/ CSS+JS assets, generates switcher.json, writes root index.html and .nojekyll - BUILD: py_binary target for assemble_publish_tree (stdlib only, no deps) - docs/sphinx/: add quality_reports.rst (coverage-only), wire into index.rst toctree, update conf.py theme options and remove unused GITHUB_PAGES_URL

- Replace apt-get lcov with @lcov_deb deb download; move generate_coverage_html.sh to quality/coverage/ and expose it as an sh_binary (quality/coverage/BUILD) - Add rules_shell 0.6.1 and lcov_deb to MODULE.bazel as dev_dependencies - Wire version_flyout CSS/JS as Bazel data deps in assemble_publish_tree instead of --shared-css/--shared-js CLI args; remove those args from deploy_docs.yml - Add exports_files to docs/sphinx/BUILD for cross-package runfile visibility - Guard upload-artifact and artifact-name output on coverage step success - Add coverage-conclusion gate step to automated_release.yml upload job - Fix cache-save to unconditional true in coverage_report.yml (workflow_call context never has event_name == 'schedule') - Remove --history arg from generate_dashboard invocation in nightly_quality.yml - Update generate_dashboard.py docstring to remove --history from usage example - Switch quality/dashboard/BUILD deps to direct @score_communication_pip// labels

ahmed0mousa force-pushed the ahmo_add_nightly_quality_kpi branch from a6a2efd to 7380bb8 Compare May 18, 2026 14:20

castler reviewed May 19, 2026

View reviewed changes

ahmed0mousa force-pushed the ahmo_add_nightly_quality_kpi branch 4 times, most recently from bff61c4 to bb988f3 Compare May 19, 2026 14:36

hoe-jo reviewed May 19, 2026

View reviewed changes

Comment thread quality/dashboard/dashboard.html.j2

Comment thread quality/static_analysis/codeql_lint.py Outdated

Comment thread .github/workflows/docs.yml Outdated

Comment thread .github/workflows/docs.yml Outdated

ahmed0mousa force-pushed the ahmo_add_nightly_quality_kpi branch 4 times, most recently from c55932b to 154647b Compare May 21, 2026 13:26

ahmed0mousa commented May 21, 2026

View reviewed changes

hoe-jo reviewed May 21, 2026

View reviewed changes

castler reviewed May 22, 2026

View reviewed changes

ahmed0mousa added 6 commits May 22, 2026 11:14

ci: add FORCE_JAVASCRIPT_ACTIONS_TO_NODE24 env to all workflows

3ef3038

Node20 will be deprecated

ahmed0mousa force-pushed the ahmo_add_nightly_quality_kpi branch from 154647b to 0f1eb00 Compare May 22, 2026 10:52

castler approved these changes May 22, 2026

View reviewed changes

castler marked this pull request as ready for review May 22, 2026 11:45

castler requested review from LittleHuba, bemerybmw, crimson11 and limdor as code owners May 22, 2026 11:45

castler added this pull request to the merge queue May 22, 2026

Merged via the queue into eclipse-score:main with commit 571427d May 22, 2026
10 checks passed

Conversation

ahmed0mousa commented May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

castler left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

castler left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ahmed0mousa May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

ahmed0mousa commented May 18, 2026 •

edited

Loading

ahmed0mousa May 22, 2026 •

edited

Loading