Merge and improve NVIDIA measurement overhead reduction feature#12
Open
boyuhang66 wants to merge 18 commits into
Open
Merge and improve NVIDIA measurement overhead reduction feature#12boyuhang66 wants to merge 18 commits into
boyuhang66 wants to merge 18 commits into
Conversation
…2 new args `--kernels` and `--analysis`"
- Extract `valid_analyses` array and `is_valid_analysis` function to the main GPUscout driver script to centralize parameter validation. - Refactor the metric collection flow and remove the long, duplicated case blocks.
- Use validate_csv_list_syntax and normalize_csv_list helper functions. - Remove the redundant empty token check inside the loop because pre-validation already handles it.
- Replace the long case statement and hardcoded execution lists with a dynamic loop to cut down on code duplication. - Group variables into arrays and load them using `declare -n` to make adding new analysis modules easier in the future. - Update timing functions to generate text labels automatically, removing the need for manual console print logs in each branch. - Add a binary availability check to catch missing or uncompiled tools early instead of letting the script fail silently.
….sh. GPUscout.sh.in already handles this validation during early argument parsing, ensuring only valid names reach this stage.
…modes and fix short kernel profiling - Fix empty profile outputs (0 rows) caused by demangled function names mismatching with hardware parameter signatures in Nsight Compute. - For `kernels_selection_mode` = `auto_from_generated_sass`: Stop stripping mangled names to their base forms. Keep raw mangled symbols from SASS and set `--kernel-name-base mangled` to ensure exact hardware matching. This also prevents overloaded functions from being lost during de-duplication. - For `kernels_selection_mode` = `user`: Set`--kernel-name-base function` to allow matching based on pure function name. - Remove `-s 5 --launch-count 1` from the `ncu` command to stop skipping early iterations, ensuring short-lived or test kernels are captured properly instead of producing empty rows. - Comment out unused `extract_kernel_base_name_from_symbol` and `build_auto_ncu_kernel_patterns` blocks.
…ngs] - Implement a manual importer fallback to bypass internal `nsys-importer` crashes caused by strict Linux kernel security settings (`kernel.perf_event_paranoia = 4`). - Add compatibility for legacy `nsys` toolchains (e.g., v2022.4.2) during the `nsys stats` stage. Automatically fall back to the older `gpukernsum` report name if the newer `cuda_gpu_kern_sum` report is not physically generated.
…CUDAToolkit version 12.4.0
… collection overhead
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This MR finalizes the NVIDIA measurement overhead reduction feature and includes several improvements and fixes identified during testing.
Changes