Introduction to OpenMP Offloading with AMD GPUs

All communication will be done through Zoom, Slack and email.

OpenMP is one major option how to use GPUs to accelerate/offload computations on today's heterogenous computer systems. This course will give an introduction to the AMD Instinct™ GPU and Accelerated Processing Unit (APU) architectures to lay foundations of how GPUs work and can be used for offloading in OpenMP. New features of recent OpenMP versions and GPUs such as the unified memory programming model will be introduced, which make writing HPC applications much easier for a wide range of GPU programming models. In addition, tools for performance analysis and optimization will be presented.

This course targets beginners in GPU programming with basic knowledge of parallelization with OpenMP and/or MPI on CPUs. After this course you will have learned the basics to confidently start porting your application from a CPU only system to systems with discrete GPU accelerators or APUs.

In this course, participants will

  • Gain foundational knowledge about GPUs and APUs, and their roles in high-performance computing.
  • Learn how to utilize OpenMP offloading with unified shared memory to simplify data management and improve performance.
  • Explore techniques for explicit data management in OpenMP offloading, enabling more control over data movement and optimization.
  • Understand the principles and benefits of asynchronous offloading to enhance computational efficiency and overlap computation with data transfer.
  • Discover various tools and methodologies for analyzing and optimizing the performance of your applications.
  • Apply your knowledge in a practical session where you’ll port a small application, reinforcing the concepts learned throughout the workshop.

Location

Online course
Organizer: HLRS, University of Stuttgart, Germany

Start date

Oct 22, 2024
09:00

End date

Oct 22, 2024
15:30

Language

English

Entry level

Basic

Course subject areas

Hardware Accelerators

Parallel Programming

Performance Optimization & Debugging

Topics

Code Optimization

GPU Programming

MPI+OpenMP

OpenMP

Back to list

Prerequisites and content levels

Prerequisites

Basic experience in OpenMP programming, e.g. by attending the Parallel Programming Workshop. Participants should have an application developer's general knowledge of computer hardware, operating systems, and be familiar with C/C++ or Fortran.

See also the suggested prereading below (resources and public videos).

Content levels

Basic: 2 hours
Intermediate: 2.5 hours
Advanced: 1 hours

Learn more about course curricula and content levels

Instructors

Michael Klemm, Paul Bauer, Luka Stanisic, Johanna Potyka, Igor Pasichnyk, and Bob Robey (AMD).

Agenda (preliminary)

All times are CEST.

08:45 - 09:00 Drop in to Zoom

9:00-15:30 Lectures and exercises on the following topics

  • Introduction by HLRS and AMD
  • Introduction to GPU and APU
  • Introduction to OpenMP offload using unified shared memory
  • Introduction to OpenMP offload with explicit data management
  • Asynchronous offloading
  • Tools for performance analysis and optimizations
  • Hands-on with porting a small app

Registration-information

Register via the button at the top of this page.
We encourage you to register to the waiting list if the course is full. Places might become available.

Please be aware that the talks and Q'n'A sessions will be recorded. You declare that you are aware of and consent to the recording by registering.

Registration closes on October 7, 2024.

Fees

This course is free of charge.

Resources for additional reading

  • Book on OpenMP GPU programming
    • Programming Your GPU with OpenMP, Tom Deakin and Tim Mattson,
      ISBN-13: ‎ 978-0262547536
  • Book of parallel and high performance computing topics
    • Parallel and High Performance Computing, Manning Publications, Robert Robey and Yuliana Zamora,
      ISBN-13: ‎ 978-0262547536
  • ENCCS resourses
  • AMD Lab Notes series on GPUOpen.com

    • Finite difference method - Laplacian part 1
    • Finite difference method - Laplacian part 2
    • Finite difference method - Laplacian part 3
    • Finite difference method - Laplacian part 4
    • AMD matrix cores
    • Introduction to profiling tools for AMD hardware
    • AMD ROCm™ installation
    • AMD Instinct™ MI200 GPU memory space overview 
    • Register pressure in AMD CDNA2™ GPUs
    • GPU-Aware MPI with ROCm
    • Jacobi Solver with HIP and OpenMP offloading
    • Sparse matrix vector multiplication - part 1

Contact

Khatuna Kakhiani phone 0711 685 65796, training(at)hlrs.de
Tobias Haas phone 0711 685 87223, training(at)hlrs.de

HLRS Training Collaborations in HPC

HLRS is part of the Gauss Centre for Supercomputing (GCS), together with JSC in Jülich and LRZ in Garching near Munich. EuroCC@GCS is the German National Competence Centre (NCC) for High-Performance Computing. HLRS is also a member of the Baden-Württemberg initiative bwHPC.

Further courses

See the training overview and the Supercomputing Academy pages.

Related training

All training

September 16 - October 18, 2024

Online (flexible)


October 14 - 18, 2024

Stuttgart, Germany


October 23 - 25, 2024

Dresden, Germany


November 04 - 08, 2024

Online


November 11 - 15, 2024

Hybrid Event - Stuttgart, Germany


December 02 - 05, 2024

Online by JSC


December 09 - 13, 2024

Online


January 21 - 23, 2025

Hybrid Event - Stuttgart, Germany