Skip to content

K1ngst0m/rgp-cli

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

rgp-cli

Open, per-instruction SQTT instruction-stitch for Linux/RADV .rgp captures: an RGP-equivalent "Instruction Timing" view without the closed Radeon GPU Profiler GUI, focused on graphics (PS/GS/VS) shaders on gfx11/RDNA3.

"RGP" / "Radeon GPU Profiler" are trademarks of AMD. This is an independent, open-source tool built on AMD's open rocprof-trace-decoder; it is not affiliated with or endorsed by AMD.

What this is (and is not)

The SQTT decode engine is AMD's open rocprof-trace-decoder. rgp-cli does not reimplement it. rgp-cli is:

  1. A patch (patches/graphics-stitch.patch) that makes that decoder stitch gfx11 graphics frames. The stock decoder is tuned for compute and derails on real graphics workloads.
  2. A validation harness (src/oracle_isa.c) that feeds byte-exact amdgpu-dis disassembly to the decoder and joins every traced instruction back to its real ISA line (reproducing RGP's instruction view). UNRES=1 classifies unresolved tokens; PERFDUMP=1 prints a frame-timing breakdown.
  3. A capture pipeline (tools/) that turns a .rgp (B00P/RADV, or AMD_RDF/Windows via the optional rdf_spike reader) into the raw SQTT streams + an absolute-address ISA map.

The patch is intended to go upstream to ROCm so every consumer benefits.

Relation to other tools

taowen/rgp-analyzer-cli wraps the stock decoder for compute tuning and reports a stitch-confidence number; it has no exec-mask / graphics support. rgp-cli is complementary: it fixes the decoder for graphics.

Status

Verified on gfx11 (RX 7800 XT, RADV 26.1.1):

Capture Stock decoder rgp-cli (patched)
vkcube / vkgears (gfx11 demos) 100% 100%
gfx12 nBody (compute) n/a 99.9%
Real-game gfx11 frame 40.6% 99.95% (7,833,986 / 7,837,716 instructions)

The headline fixes:

  • s_waitcnt_depctr → IMMED: gfx11 emits a timed token for it, where the stock gfx12 analogy marked it SKIP and orphaned the token.
  • GS/PS shader-base disambiguation: PS/GS/HS bases share one slot (last-write-wins), so GS waves inherited the PS entry and derailed; the stitcher now disambiguates per wave by token-category fit.
  • Matcher robustness for sparse graphics waves: don't derail on exec-mask control flow, skip tokens that carry no instruction, and recover instructions a loop re-executed via a backward scan.

The residual gap is category-matcher imprecision under sparse SQTT anchors; closing it fully would need an exact per-instruction sequencer rather than more heuristics.

Layout

src/oracle_isa.c             stitch validation harness (the "oracle")
tools/build_capture.py       .rgp -> se*_raw.bin + co_*.elf + isa_map.tsv (orchestrator)
tools/build_codeobjects.py   code-object extraction (B00P + AMD_RDF)
tools/build_isa_map.py       amdgpu-dis disassembly -> absolute-address ISA map
patches/graphics-stitch.patch  the decoder fixes (apply onto the pinned commit)
decoder/setup.sh             clone ROCm decoder @pinned commit, apply patch, build the .so

Prerequisites

  • A C toolchain + cmake + ninja, and an LLVM with the AMDGPU backend (LLVM_DIR, default points at gentoo llvm-22).
  • amdgpu-dis (ships with the Radeon Developer Tool Suite / ROCm); set AMDGPU_DIS=/path/to/amdgpu-dis.
  • python3.

Quick start

make decoder                          # one-time: clone + patch + build the decoder .so
make oracle                           # build bin/oracle_isa
make run CAPTURE=path/to/frame.rgp    # build capture data (into /dev/shm/rgpcli) + stitch

# from the OUT dir, with ROCPROF_SO pointing at the patched .so:
UNRES=1    bin/oracle_isa se0_raw.bin   # classify unresolved tokens
PERFDUMP=1 bin/oracle_isa se0_raw.bin   # per-frame timing / stall breakdown

License

MIT (LICENSE). The decoder patch modifies MIT-licensed ROCm code and is intended for upstream contribution.

About

Open, per-instruction SQTT instruction-stitch for Linux/RADV .rgp captures: an RGP-equivalent "Instruction Timing" view without the closed Radeon GPU Profiler GUI, focused on graphics (PS/GS/VS) shaders on gfx11/RDNA3.

Resources

License

Stars

Watchers

Forks

Contributors