rgp-cli

Open, per-instruction SQTT instruction-stitch for Linux/RADV .rgp captures: an RGP-equivalent "Instruction Timing" view without the closed Radeon GPU Profiler GUI, focused on graphics (PS/GS/VS) shaders on gfx11/RDNA3.

"RGP" / "Radeon GPU Profiler" are trademarks of AMD. This is an independent, open-source tool built on AMD's open rocprof-trace-decoder; it is not affiliated with or endorsed by AMD.

What this is (and is not)

The SQTT decode engine is AMD's open rocprof-trace-decoder. rgp-cli does not reimplement it. rgp-cli is:

A patch (patches/graphics-stitch.patch) that makes that decoder stitch gfx11 graphics frames. The stock decoder is tuned for compute and derails on real graphics workloads.
A validation harness (src/oracle_isa.c) that feeds byte-exact amdgpu-dis disassembly to the decoder and joins every traced instruction back to its real ISA line (reproducing RGP's instruction view). UNRES=1 classifies unresolved tokens; PERFDUMP=1 prints a frame-timing breakdown.
A capture pipeline (tools/) that turns a .rgp (B00P/RADV, or AMD_RDF/Windows via the optional rdf_spike reader) into the raw SQTT streams + an absolute-address ISA map.

The patch is intended to go upstream to ROCm so every consumer benefits.

Relation to other tools

taowen/rgp-analyzer-cli wraps the stock decoder for compute tuning and reports a stitch-confidence number; it has no exec-mask / graphics support. rgp-cli is complementary: it fixes the decoder for graphics.

Status

Verified on gfx11 (RX 7800 XT, RADV 26.1.1):

Capture	Stock decoder	rgp-cli (patched)
vkcube / vkgears (gfx11 demos)	100%	100%
gfx12 nBody (compute)	n/a	99.9%
Real-game gfx11 frame	40.6%	99.95% (7,833,986 / 7,837,716 instructions)

The headline fixes:

s_waitcnt_depctr → IMMED: gfx11 emits a timed token for it, where the stock gfx12 analogy marked it SKIP and orphaned the token.
GS/PS shader-base disambiguation: PS/GS/HS bases share one slot (last-write-wins), so GS waves inherited the PS entry and derailed; the stitcher now disambiguates per wave by token-category fit.
Matcher robustness for sparse graphics waves: don't derail on exec-mask control flow, skip tokens that carry no instruction, and recover instructions a loop re-executed via a backward scan.

The residual gap is category-matcher imprecision under sparse SQTT anchors; closing it fully would need an exact per-instruction sequencer rather than more heuristics.

Layout

src/oracle_isa.c             stitch validation harness (the "oracle")
tools/build_capture.py       .rgp -> se*_raw.bin + co_*.elf + isa_map.tsv (orchestrator)
tools/build_codeobjects.py   code-object extraction (B00P + AMD_RDF)
tools/build_isa_map.py       amdgpu-dis disassembly -> absolute-address ISA map
patches/graphics-stitch.patch  the decoder fixes (apply onto the pinned commit)
decoder/setup.sh             clone ROCm decoder @pinned commit, apply patch, build the .so

Prerequisites

A C toolchain + cmake + ninja, and an LLVM with the AMDGPU backend (LLVM_DIR, default points at gentoo llvm-22).
amdgpu-dis (ships with the Radeon Developer Tool Suite / ROCm); set AMDGPU_DIS=/path/to/amdgpu-dis.
python3.

Quick start

make decoder                          # one-time: clone + patch + build the decoder .so
make oracle                           # build bin/oracle_isa
make run CAPTURE=path/to/frame.rgp    # build capture data (into /dev/shm/rgpcli) + stitch

# from the OUT dir, with ROCPROF_SO pointing at the patched .so:
UNRES=1    bin/oracle_isa se0_raw.bin   # classify unresolved tokens
PERFDUMP=1 bin/oracle_isa se0_raw.bin   # per-frame timing / stall breakdown

License

MIT (LICENSE). The decoder patch modifies MIT-licensed ROCm code and is intended for upstream contribution.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
decoder		decoder
patches		patches
src		src
tools		tools
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

rgp-cli

What this is (and is not)

Relation to other tools

Status

Layout

Prerequisites

Quick start

License

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

rgp-cli

What this is (and is not)

Relation to other tools

Status

Layout

Prerequisites

Quick start

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages