Introduction to Hybrid Programming in HPC and Porting and Optimization Workshop

Research & Science
Introduction to Hybrid Programming in HPC and Porting and Optimization Workshop


Monday-Tuesday: Introduction to Hybrid Programming in HPC (open to everybody)

Most HPC systems are clusters of shared memory nodes. To use such systems efficiently both memory consumption and communication time has to be optimized. Therefore, hybrid programming may combine the distributed memory parallelization on the node interconnect (e.g., with MPI) with the shared memory parallelization inside of each node (e.g., with OpenMP or MPI-3.0 shared memory). This course analyzes the strengths and weaknesses of several parallel programming models on clusters of SMP nodes. Multi-socket-multi-core systems in highly parallel environments are given special consideration. MPI-3.0 has introduced a new shared memory programming interface, which can be combined with inter-node MPI communication. It can be used for direct neighbor accesses similar to OpenMP or for direct halo copies, and enables new hybrid programming models. These models are compared with various hybrid MPI+OpenMP approaches and pure MPI. Numerous case studies and micro-benchmarks demonstrate the performance-related aspects of hybrid programming.

Hands-on sessions are included on both days. Tools for hybrid programming such as thread/process placement support and performance analysis are presented in a "how-to" section. This course provides scientific training in Computational Science, and in addition, the scientific exchange of the participants among themselves. This course is organized by HLRS, in cooperation with RRZE and VSC (Vienna Scientific Cluster).

Target audience: Scientists developping HPC applications with MPI.

Wednesday-Friday: Porting and Optimization Workshop (only for customers of HLRS)

In this workshop, users can port their applications to HLRS' new AMD-based HPE Apollo 9000 supercomputer "Hawk" (to be installed in early 2020) assisted by HLRS, HPE and AMD staff. By means of this assistance, it's further possible to enhance the node-level performance as well as scaling of the codes. By doing so, users can raise quantity and/or quality of their scientific findings while the costs (in terms of core hours) remain constant.

In order to achieve usable efficiency enhancements, it is important to discuss pros and cons of potential solutions. This, however, requires application as well as machine expertise. Hence, this workshop brings together our users (with their application expertise) and support staff (with their machine expertise).

Target audience: Groups holding a compute time budget to be used on Hawk.

We combined both course parts, because the hybrid days may be a good basis for the challenges when porting your application to Hawk with its 128 cores per node.

Agenda & Content

Agenda & Content (preliminary)

1st day (Hybrid programming, part 1)

08:30   Registration
09:00      Welcome
09:05      Motivation
09:15      Introduction
09:45      Programming Models
09:50       - MPI + OpenMP
10:30   Coffee Break
10:50       - continue: MPI + OpenMP
11:40         Practical (how to compile and start)
12:30         Practical (hybrid through OpenMP parallelization)
13:00   Lunch
14:00         Practical (continued)
15:00   Coffee Break
15:20       - Overlapping Communication and Computation
15:40         Practical (taskloops)
16:20       - MPI + OpenMP Conclusions
16:30       - MPI + Accelerators
16:45      Tools
17:00   End of first day

2nd day (Hybrid programming, part 2)

09:00      Programming Models (continued)
09:05       - MPI + MPI-3.0 Shared Memory
09:45         Practical (replicated data)
10:30   Coffee break
10:50         continue: Practical (replicated data)
11:50       - MPI Memory Models and Synchronization
12:30   Lunch
13:30       - Pure MPI
13:50       - Topology Optimization
14:30   Coffee Break
14:50         Practical (application aware Cartesian topology)
15:45       - Topology Optimization (Wrap up)
16:00      Conclusions
16:15      Q & A
16:30   End of second day (course)

3rd to 4th day (Porting and Optimization Workshop)

9:00 - 17:30 Supported by HLRS, HPE and AMD specialists, you will port your application to the new supercomputer system “Hawk”. Furthermore, it will be possible to analyze the runtime behavior of your code, locate bottlenecks, design and discuss potential solutions as well as implement them, again assisted by the specialists mentioned above. All categories of bottlenecks (CPU, memory subsystem, communication and I/O) will be addressed, according to the respective requirements. Ideally, the above steps will be repeated multiple times in order to address several bottlenecks. If requested by participants, also lectures can be given on various topics.

Last day

9:00 - 15:30 dto


Hybrid programming days:
Basic MPI and OpenMP knowledge as presented, e.g., in our Training Courses on MPI and OpenMP.
For the hands-on sessions you should know Unix/Linux and either C/C++ or Fortran in particular.

Scaling workshop:
Your group holds a compute time budget to be used on Hawk.


The course language is English.

Course material
Course material

Hybrid programming days: See (preliminary link, from previous course).


Hybrid programming days: Dr. habil. Georg Hager (RRZE/HPC, Uni. Erlangen), Dr. Rolf Rabenseifner (HLRS, Uni. Stuttgart), Dr. Claudia Blaas-Schenner and Dr. Irene Reichl (VSC Team, TU Wien)

Porting and optimization days: HLRS, HPE and AMD staff

Registration & Further Information

via online registration form. If you want to register only for the Hybrid Programming days, then please register only for days 1+2 in the form.


for registration is Jan. 06, 2020.


Students without Diploma/Master: 25 EUR
Students with Diploma/Master (PhD students) at German universities: 45 EUR
Members of German universities and public research institutes: 45 EUR
Members of universities and public research institutes within EU or PRACE member countries: 90 EUR.
Members of other universities and public research institutes: 180 EUR
Others: 420 EUR

(includes coffee breaks)


Travel Information and Accommodation

see our How to find us page.


HLRS is part of the Gauss Centre for Supercomputing (GCS), which is one of the six PRACE Advanced Training Centres (PATCs) that started in Feb. 2012.
HLRS is also member of the Baden-Württemberg initiative bwHPC-C5.
This course is provided within the framework of the bwHPC-C5 user support. This course is not part of the PATC curriculum and is not sponsored by the PATC program.

Local Organizer

Rolf Rabenseifner phone 0711 685 65530,
Lucienne Dettki phone 0711 685 63894,
Björn Dick phone 0711 685 87189, dick

Shortcut-URL & Course Number