Merge and improve NVIDIA measurement overhead reduction feature by boyuhang66 · Pull Request #12 · caps-tum/GPUscout

boyuhang66 · 2026-06-30T12:06:38Z

Summary

This MR finalizes the NVIDIA measurement overhead reduction feature and includes several improvements and fixes identified during testing.

Changes

Cleaned up and simplified the implementation.
Improved NCU kernel matching:
- Use mangled names for automatically extracted kernels.
- Use function names for user-specified kernels.
- Removed the hardcoded skip limit to correctly profile short-lived kernels.

…2 new args `--kernels` and `--analysis`"

…cu++fit"

- Extract `valid_analyses` array and `is_valid_analysis` function to the main GPUscout driver script to centralize parameter validation. - Refactor the metric collection flow and remove the long, duplicated case blocks.

- Use validate_csv_list_syntax and normalize_csv_list helper functions. - Remove the redundant empty token check inside the loop because pre-validation already handles it.

- Replace the long case statement and hardcoded execution lists with a dynamic loop to cut down on code duplication. - Group variables into arrays and load them using `declare -n` to make adding new analysis modules easier in the future. - Update timing functions to generate text labels automatically, removing the need for manual console print logs in each branch. - Add a binary availability check to catch missing or uncompiled tools early instead of letting the script fail silently.

….sh. GPUscout.sh.in already handles this validation during early argument parsing, ensuring only valid names reach this stage.

…modes and fix short kernel profiling - Fix empty profile outputs (0 rows) caused by demangled function names mismatching with hardware parameter signatures in Nsight Compute. - For `kernels_selection_mode` = `auto_from_generated_sass`: Stop stripping mangled names to their base forms. Keep raw mangled symbols from SASS and set `--kernel-name-base mangled` to ensure exact hardware matching. This also prevents overloaded functions from being lost during de-duplication. - For `kernels_selection_mode` = `user`: Set`--kernel-name-base function` to allow matching based on pure function name. - Remove `-s 5 --launch-count 1` from the `ncu` command to stop skipping early iterations, ensuring short-lived or test kernels are captured properly instead of producing empty rows. - Comment out unused `extract_kernel_base_name_from_symbol` and `build_auto_ncu_kernel_patterns` blocks.

…ngs] - Implement a manual importer fallback to bypass internal `nsys-importer` crashes caused by strict Linux kernel security settings (`kernel.perf_event_paranoia = 4`). - Add compatibility for legacy `nsys` toolchains (e.g., v2022.4.2) during the `nsys stats` stage. Automatically fall back to the older `gpukernsum` report name if the newer `cuda_gpu_kern_sum` report is not physically generated.

…CUDAToolkit version 12.4.0

… collection overhead

Qichen Liu and others added 18 commits June 12, 2026 02:26

Intergrate commit "End to end filtering of kernels and analysis with …

fc4a438

…2 new args `--kernels` and `--analysis`"

Enhance kernel name handling: remove trailing "(" before passing to "…

342ea27

…cu++fit"

Fix kernel keying bug with a more robust csv parser

c6b309a

Improve error message for cupti function failures

48bd3ca

Merge and integrate commit "Nsys profile and stats integration"

b752d12

Fix CMake executable target collision for save_to_json

8301327

Polish: Update comment

8ebba7b

Polish: Fix cupti error message bug

883518f

Update README

f4b6b16

Polish: Clean up analysis check and metric collection code

1441c83

- Extract `valid_analyses` array and `is_valid_analysis` function to the main GPUscout driver script to centralize parameter validation. - Refactor the metric collection flow and remove the long, duplicated case blocks.

Polish: Clean up parse_csv_list function

89bfe7f

- Use validate_csv_list_syntax and normalize_csv_list helper functions. - Remove the redundant empty token check inside the loop because pre-validation already handles it.

Polish: Delete the duplicate invalid analysis check from measurements…

3b37ca2

….sh. GPUscout.sh.in already handles this validation during early argument parsing, ensuring only valid names reach this stage.

Polish: Add comment for understanding

cb6f404

Polish: Comment Fallback implementation since nys profile works with …

5a80712

…CUDAToolkit version 12.4.0

Polish: Restore 'ncu' parameter 'launch count' as 1 to reduce metrics…

7bdb611

… collection overhead

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Merge and improve NVIDIA measurement overhead reduction feature#12

Merge and improve NVIDIA measurement overhead reduction feature#12
boyuhang66 wants to merge 18 commits into
caps-tum:developfrom
boyuhang66:feature-integrate-overhead-reduction

boyuhang66 commented Jun 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

boyuhang66 commented Jun 30, 2026

Summary

Changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants