The Profiling AI Software Bootcamp covers the process and tools for profiling AI and machine learning applications to fully utilize high-performance systems. Attendees will learn to profile applications using NVIDIA Nsight™ Systems, a system-wide performance analysis tool; analyze and identify optimization opportunities; and improve application performance to scale efficiently across systems of any size and number of CPUs and GPUs. Additionally, this bootcamp will walk through the system topology to learn the dynamics of FP8 precision, multi-GPU, and multi-node connections and architecture.
This content contains 4 Labs:
- Lab 1: System Topology
- Lab 2: Distributed Training Strategy
- Lab 3: Performance Overview
- Lab 4: Transformer Engine
The duration of the tutorial is 4 hours 30 minutes.
The tools and frameworks used in this bootcamp are as follows
To deploy the Labs, please refer to the deployment guide presented here
This material originates from the OpenHackathons GitHub repository. Check out additional materials here.
Don't forget to check out additional Open Hackathons Resources and join our OpenACC and Hackathons Slack Channel to share your experience and get more help from the community.
Copyright © 2026 OpenACC-Standard.org. This material is released by OpenACC-Standard.org, in collaboration with NVIDIA Corporation, under the Creative Commons Attribution 4.0 International (CC BY 4.0). These materials may include references to hardware and software developed by other entities; all applicable licensing and copyrights apply.