Introduction to OpenMP Offloading with AMD GPUs

All communication will be done through Zoom, Slack and email.

OpenMP is one major option how to use GPUs to accelerate/offload computations on today's heterogenous computer systems. This course will give an introduction to the AMD Instinct™ GPU and Accelerated Processing Unit (APU) architectures to lay foundations of how GPUs work and can be used for offloading in OpenMP. New features of recent OpenMP versions and GPUs such as the unified memory programming model will be introduced, which make writing HPC applications much easier for a wide range of GPU programming models. In addition, tools for performance analysis and optimization will be presented.

This course targets beginners in GPU programming with basic knowledge of parallelization with OpenMP and/or MPI on CPUs. After this course you will have learned the basics to confidently start porting your application from a CPU only system to systems with discrete GPU accelerators or APUs.

In this course, participants will

Gain foundational knowledge about GPUs and APUs, and their roles in high-performance computing.
Learn how to utilize OpenMP offloading with unified shared memory to simplify data management and improve performance.
Explore techniques for explicit data management in OpenMP offloading, enabling more control over data movement and optimization.
Understand the principles and benefits of asynchronous offloading to enhance computational efficiency and overlap computation with data transfer.
Discover various tools and methodologies for analyzing and optimizing the performance of your applications.
Apply your knowledge in a practical session where you’ll port a small application, reinforcing the concepts learned throughout the workshop.

Location

Online course
Organizer: HLRS, University of Stuttgart, Germany

Registration

Start date

Oct 22, 2024
09:00

End date

Oct 22, 2024
15:30

Language

English

Entry level

Basic

Course subject areas

Bootcamp/Hackathon

Topics

Code Optimization

GPU Programming

MPI+OpenMP

OpenMP

Back to list

Prerequisites and content levels

Prerequisites

Basic experience in OpenMP programming, e.g. by attending the Parallel Programming Workshop. Participants should have an application developer's general knowledge of computer hardware, operating systems, and be familiar with C/C++ or Fortran.

See also the suggested prereading below (resources and public videos).

Content levels

Basic: 2 hours
Intermediate: 2.5 hours
Advanced: 1 hours

Learn more about course curricula and content levels

Instructors

Michael Klemm, Paul Bauer, Luka Stanisic, Johanna Potyka, Igor Pasichnyk, and Bob Robey (AMD).

Agenda (update, 21.10)

All times are CEST.

08:45 - 09:00 Drop in to Zoom

9:00 - 11:45 Introduction to OpenMP offload with and without unified shared memory (with exercises)

11:45 – 12:45 lunch break

12:45 – 15:30 Real world OpenMP porting: App porting examples and tools (with exercises)

Lectures and exercises will cover following topics:

Introduction to GPU and APU
OpenMP offload using unified shared memory
OpenMP offload with explicit data management
Asynchronous offloading
Tools for performance analysis and optimizations
Hands-on with porting a small app

Registration information

Register via the button at the top of this page.
We encourage you to register to the waiting list if the course is full. Places might become available.

Please be aware that the talks and Q'n'A sessions will be recorded. You declare that you are aware of and consent to the recording by registering.

Registration closes on October 17, 2024. Late registration might still be possible if the course allows.

Fees

This course is free of charge.

Resources for additional reading

Book on OpenMP GPU programming
- Programming Your GPU with OpenMP, Tom Deakin and Tim Mattson,
  ISBN-13: ‎ 978-0262547536
Book of parallel and high performance computing topics
- Parallel and High Performance Computing, Manning Publications, Robert Robey and Yuliana Zamora,
  ISBN-13: ‎ 978-0262547536
ENCCS resourses
- Developing Applications with the AMD ROCm
AMD Lab Notes series on GPUOpen.com
- Finite difference method - Laplacian part 1
- Finite difference method - Laplacian part 2
- Finite difference method - Laplacian part 3
- Finite difference method - Laplacian part 4
- AMD matrix cores
- Introduction to profiling tools for AMD hardware
- AMD ROCm™ installation
- AMD Instinct™ MI200 GPU memory space overview
- Register pressure in AMD CDNA2™ GPUs
- GPU-Aware MPI with ROCm
- Jacobi Solver with HIP and OpenMP offloading
- Sparse matrix vector multiplication - part 1

Contact

Khatuna Kakhiani phone 0711 685 65796, training(at)hlrs.de
Tobias Haas phone 0711 685 87223, training(at)hlrs.de

HLRS Training Collaborations in HPC and AI

HLRS is part of the Gauss Centre for Supercomputing (GCS), together with JSC in Jülich and LRZ in Garching near Munich. EuroCC@GCS is the German National Competence Centre (NCC) for High-Performance Computing. HLRS is also a member of the Baden-Württemberg initiative bwHPC. Since 2025, HLRS coordinates HammerHAI.