Optimization of Scaling and Node-level Performance on Hazel Hen (Cray XC40)

Enterprises & SME Research & Science
Optimization of Scaling and Node-level Performance on Hazel Hen (Cray XC40)



In order to increase the efficiency of our users' applications on Hazel Hen, HLRS and Cray offer this workshop to enhance the node-level performance as well as scaling of the codes utilized by participants. By doing so, users can raise the volume as well as quality of their scientific findings while the costs (in terms of core hours) remain constant.

Target audience: Code owners of applications already running on Hazel Hen.


Supported by HLRS and CRAY specialists, you will analyze the runtime behavior of your code, locate bottlenecks, design and discuss potential solutions as well as implement them. All categories of bottlenecks (CPU, memory subsystem, communication and I/O) will be addressed, according to the respective requirements. Ideally, the above steps will be repeated multiple times in order to address several bottlenecks.

In order to achieve usable efficiency enhancements, it is important to discuss pros and cons of potential solutions. This, however, requires application as well as hardware expertise. Hence, this workshop brings together application and hardware specialists in the shape of users and machine experts.

Since I/O is the bottleneck of many HPC applications these days, the first day of this workshop is dedicated to I/O only. Lectures will be given on scalable I/O strategies and mechanisms (MPI-IO, NetCDF, HDF5). By means of hands on sessions, participants can gain initial experience with those.

Daily schedule
(it's possible to merely book individual days)

First day
9:00 -  9:30     Local registration
9:30 - 17:30     I/O-Course

2nd and 3rd day
9:00 - 17:30     Course

Last day
9:00 - 16:30     Course




In order to attend the workshop, you should already have an account on Hazel Hen and your application should already be running on the system. Furthermore, we require that you bring your own code including a dataset. Before the workshop starts, you will be required to profile your application once using CrayPAT-lite by means of the following steps:

  1. Set up a test case. While doing so, please adhere to the following requirements:
    • number of cores:

      • In order to be representative, the test case should be in size comparable to your current and maybe future production runs.
      • In order to save valuable resources, it should however be as small as possible.
      • So take into account to reduce the size of your test case as long as the profile obtained by step 4.) stays almost constant (cf. pat_report manuals regarding how to view profiles).

    • wall time:

      • In order to allow for a productive as well as effective workflow, the wall time should be a few minutes only.
      • At the same time, it should cover all important parts of the code, including I/O.
      • So take into account to reduce the number of simulated time steps.
  2. Load the perftools modules:
    • $> module load perftools-base
    • $> module load perftools-lite
  3. Rebuild your application, e.g.:
    • $> make clean; make
  4. Run the job with the newly built executable from within a Workspace (cf. https://wickie.hlrs.de/platforms/index.php/Workspace_mechanism).
  5. As a result of 4.), a profiling container <executable_name>+<some-id>s is created in the working directory.
  6. Please send us this subdirectory by copying it to
    (please also change the permissions via chmod -R 777 /lustre/cray/ws8/ws/hpcbjdic-XC_WS_preparation/<your_application_your_name>).


Please be aware of the fact that merely attending the introductory course (e.g. in  Sep. 2017) does not provide you with all of these prerequisites!




German (in English, if required)


Stefan Andersson and TBA (Cray)


Handouts will be available to participants in printed as well as digital form.


via online registration form


for registration is Mar. 25, 2018


Members of German universities and public research institutes: none
Members of universities and public research institutes within EU or PRACE-member countries: none
Members of other universities and public research institutes: 120 EUR
Others: 400 EUR
(includes coffee breaks)


Travel Information and Accommodation

see our How to find us page.


HLRS is part of the Gauss Centre for Supercomputing (GCS), which is one of the six PRACE Advanced Training Centres (PATCs) that started in Feb. 2012.
HLRS is also member of the Baden-Württemberg initiative bwHPC-C5.
This course is provided within the framework of the bwHPC-C5 user Support. This course is not part of the PATC curriculum and is not sponsored by the PATC program.


Rolf Rabenseifner phone 0711 - 685 65530, rabenseifner@hlrs.de
Björn Dick phone 0711 - 685 87189, dick@hlrs.de
Lucienne Dettki phone 0711 - 685 63894, dettki@hlrs.de

Shortcut-URL & Course Number