Höchstleistungsrechenzentrum Stuttgart: Introduction to OpenMP Offloading with AMD GPUs

Prerequisites and content levels

Prerequisites

Basic experience in OpenMP programming, e.g. by attending the Parallel Programming Workshop. Participants should have an application developer's general knowledge of computer hardware, operating systems, and be familiar with C/C++ or Fortran.

See also the suggested prereading below (resources and public videos).

Content levels

Basic: 2 hours
Intermediate: 2.5 hours
Advanced: 1 hours

Learn more about course curricula and content levels

Instructors

Presenters: Michael Klemm, Luka Stanisic, Johanna Potyka
Additional HLRS and AMD staff to support exercises (tbd)

Learning outcomes

In this course, participants will

Gain foundational knowledge about GPUs and APUs and their roles in high-performance computing.
Learn how to utilize OpenMP offloading with unified shared memory to simplify data management and improve performance.
Explore techniques for explicit data management in OpenMP offloading, enabling more control over data movement and optimization for discrete GPUs.
Understand the principles and benefits of asynchronous offloading to enhance computational efficiency and overlap computation with data transfer.
Discover various tools and methodologies for analyzing and optimizing the performance of your applications.
Apply their knowledge in a practical session where they will port a small application, reinforcing the concepts learned throughout the course.

Agenda

Preliminary - All times are CEST.

Day 1:

08:45 - 09:00 Drop in to Webex
9:00 - 13:00 Introduction to OpenMP offload with and without unified shared memory (with exercises)
14:00-17:00 Participants can continue working on exercises or do own experiments. Limited support via chat available

Day 2:

08:45 - 09:00 Drop in to Webex
9:00 - 13:00 Real world OpenMP porting: App porting examples and tools (with exercises)
14:00-17:00 Participants can continue working on exercises or do own experiments. Limited support via chat available

Lectures and exercises will cover following topics:

Introduction to GPU and APU
OpenMP offload using unified shared memory
OpenMP offload with explicit data management
Asynchronous offloading
Real world porting and optimization examples
Tools for performance analysis and optimizations
Hands-on with porting a small app

Registration information

Apply for this course via the button at the top of this page (will be available soon).

Please be aware that the talks and Q'n'A sessions will be recorded. You declare that you are aware of and consent to the recording by registering.

Registration closes on October 13, 2026.

Fees

This course is free of charge.

Resources for additional reading

Resources

Book on HIP programming - Porting CUDA
- Accelerated Computing with HIP, Yifan Sun, Trinayan Baruah, David R Kaeli,
  ISBN-13: ‎ 979-8218107444
Book on OpenMP GPU programming
- Programming Your GPU with OpenMP, Tom Deakin and Tim Mattson,
  ISBN-13: ‎ 978-0262547536
Book of parallel and high performance computing topics
- Parallel and High Performance Computing, Manning Publications, Robert Robey and Yuliana Zamora,
  ISBN-13: ‎ 978-0262547536
Blogpost covering advantages of the MI300A APU architecture
- MI300A - Exploring the APU advantage — ROCm Blogs
HLRS self-study materials
- Under GPU offloading and related topics, find:
  - AMD Instinct™ GPU Training: Recordings and Slides of previous versions of this training
  - Introduction to OpenMP Offloading with AMD GPUs: Recordings and Slides of the dedicated OpenMP Offloading course
Training examples and exercises for HPC and AI on AMD GPUs
- https://github.com/AMD/HPCTrainingExamples
Unlocking HPC Performance with AMD – Recorded Training Series (YouTube)

Contact

Tobias Haas phone 0711 685 87223, training(at)hlrs.de

HLRS Training Collaborations in HPC and AI

HLRS is part of the Gauss Centre for Supercomputing (GCS), together with JSC in Jülich and LRZ in Garching near Munich. SIDE is the German National Competence Centre (NCC) for High-Performance Computing. HLRS is also a member of the Baden-Württemberg initiative bwHPC.
Since 2025, HLRS has been coordinating one of the AI Factories of the EuroHPC JU: HammerHAI.

Further courses

See the training overview and the Supercomputing Academy pages.

Introduction to OpenMP Offloading with AMD GPUs

Veranstaltungsort

Prerequisites and content levels

Prerequisites

Content levels

Instructors

Learning outcomes

Agenda

Preliminary - All times are CEST.

Registration information

Fees

Resources for additional reading

Resources

Contact

HLRS Training Collaborations in HPC and AI

Further courses

Ähnliche Trainingskurse

Alle Trainingskurse

Efficient Parallel Programming with GASPI

Node-Level Performance Engineering

The Gray Scott HPC Summer School

Multi-GPU Deep Learning (cancelled)

Parallel Programming Workshop (Train the Trainer)

Parallel Programming Workshop (MPI, OpenMP and Advanced Topics)

Hackathon: Porting and Optimization for Hunter