Höchstleistungsrechenzentrum Stuttgart: GPU Programming using CUDA

Prerequisites and content levels

For part 1 (introductory)

Programming experience in any of C, C++, or Fortran. Exercises will use a Linux cluster. Therefore you should have some basic knowlegde about how to work with a Linux shell and a text editor in a shell. Resources for this could be e.g. https://ubuntu.com/tutorials/command-line-for-beginners and for an editor https://opensource.com/article/19/3/getting-started-vim. Some knowledge about parallel programming is a plus.

For part 2 (advanced)

Additionally to the prerequisites above you should be familiar with the topics of part 1.

Content levels

Basic: 12 hours
Intermediate: 6 hours
Advanced: 3 hours

Learn more about course curricula and content levels.

Instructors

Tobias Haas (HLRS)

Learning outcomes

After this course, participants will:

be familiar with the CUDA programming model,
have basic knowledge on performance optimization and profiling of CUDA code,
be aware of available correctness checking tools,
have seen a first approach to multi-GPU programming,
have an overview of important CUDA libraries.

Agenda (preliminary)

All times are local times in the central European time zone (Berlin).

Drop in to the video conference (8:45 - 9:00)

Course will take place from 9:00 - 12:30 on each day.

Cluster dry run on Mon Nov 10 at 2 pm

Part 1 (Tue - Thu Nov 11-13)

Day 1

Basics about CUDA
Kernel, kernel launch, host/device functions
Memory management: host and managed memory
Synchronization
Error handling

Day 2

Profiling and NVTX annotations
Memory management: pinned and device, unified memory
Overview of CUDA libraries

Day 3

GPU architecture
Performance optimization: memory access patterns and cache
Coalesced memory access
Modern C++ and GPUs

Part 2 (Mon - Thu Nov 17-20)

Day 4

Shared and constant memory
Bank conflics
Atomic operations

Day 5

CUDA streams
Introduction to Multi-GPU

Day 6

warpShuffle, CUB
cooperative_groups

Day 7

Kernel-level profiling and correctness checking
Other programming methods (using base language constructs, pragmas and libraries)
Interoperability with OpenMP

Handouts

Each participant will get access to all slides (PDF).

Exercises

Although this is an online course, the exercises will be very interactive using break out rooms. Participants will work on HLRS's systems.

Registration information

Register via the button at the top of this page.
We encourage you to register to the waiting list if the course is full. Places might become available.

Fees

Employees of the HLRS or the Jülich Supercomputing Centre (JSC) or the Leibniz Supercomputing Centre (LRZ): 0 Euro
Students without master’s degree or equivalent: 32.50 Euro
PhD students or employees at a German university or public research institute: 67.50 Euro
PhD students or employees at a university or public research institute in an EU, EU-associated or PRACE country other than Germany: 135 Euro
PhD students or employees at a university or public research institute outside of EU, EU-associated or PRACE countries: 270 Euro
Other participants, e.g., from industry, other public service providers, or government: 690 Euro

Our course fee includes coffee breaks (in classroom courses only).

For lists of EU and EU-associated coutries, and PRACE countries have a look at the Horizon Europe and PRACE website.

Contact

Lucienne Dettki, phone 0711 685 63894, training(at)hlrs.de
Tobias Haas, phone 0711 685 87223, training(at)hlrs.de

HLRS training collaborations in HPC and AI

HLRS is part of the Gauss Centre for Supercomputing (GCS), together with JSC in Jülich and LRZ in Garching near Munich. EuroCC@GCS is the German National Competence Centre (NCC) for High-Performance Computing. HLRS is also a member of the Baden-Württemberg initiative bwHPC. Since 2025, HLRS coordinates HammerHAI.

This course is provided within the framework of the bwHPC training program.

Further courses and training team

See the training overview and the Supercomputing Academy pages.
See also information about the HLRS training department and staff.

GPU Programming using CUDA

Veranstaltungsort