You are in the main area:Organization
Headerimage for: Node-level Performance Engineering

Node-level Performance Engineering

Large seminar room, HLRS (Höchstleistungsrechenzentrum Stuttgart), Universität Stuttgart, Allmandring 30, D-70569 Stuttgart


2014, Monday, July 14, 9:00 - Tuesday, July 15, 17:00


This course teaches performance engineering approaches on the compute node level. "Performance engineering" as we define it is more than employing tools to identify hotspots and bottlenecks. It is about developing a thorough understanding of the interactions between software and hardware. This process must start at the core, socket, and node level, where the code gets executed that does the actual computational work. Once the architectural requirements of a code are understood and correlated with performance measurements, the potential benefit of optimizations can often be predicted. We introduce a "holistic" node-level performance engineering strategy, apply it to different algorithms from computational science, and also show how an awareness of the performance features of an application may lead to notable reductions in power consumption.
This course provides scientific training in Computational Science, and in addition, the scientific exchange of the participants among themselves.


Attendees are highly invited to also join the course "User Guided Optimization in High-Level Languages" held on July 16 that targets related topics!


First day:

09:00 - 09:30 local registration
09:30 - 13:00 lectures (with breaks: 10:30-10:45 & 11:45-12:00)
13:00 - 14:00 lunch break
14:00 - 17:00 lectures  (with breaks: 15:10-15:25)

Second day:

09:00 - 13:00 lectures (with breaks: 10:15-10:30 & 11:45-12:00)
13:00 - 14:00 lunch break
14:00 - 17:00 lectures  (with breaks: 15:10-15:25)

Detailed Program


  • Intel and AMD x86 architectures
  • ccNUMA
  • Performance modeling & engineering approaches
  • Our Approach


Practical performance analysis

  • The LIKWID tools
  • Typical performance patterns


Microbenchmarks and the memory hierarchy                

  • Understanding the memory hierarchy
    • Data transfer between memory levels
    • Write allocate vs. NT stores
    • Modeling of cache hierarchies
    • Contention
  • NUMA effects - anisotropy and asymmetry

Typical node-level software overheads

  • Cost of synchronization
  • Work Distribution

Example Problem: The 3D Jacobi solver

  • Core-level optimizations
    •  Blocking
    •  Non Temporal stores
    •  SIMD vectorization (SSE, AVX)
  • Multithreading - contention at different memory hierarchies
  • Temporal Blocking

Example Problem: The Lattice-Boltzmann Method (LBM)

  • Introduction
  • Roofline Model
  • Data layout
  • Non Temporal stores
  • Model  for in-cache data & multicore scaling
  • Sparse representation and options for Propagation

Example Problem: Sparse Matrix-Vector Multiplication

  • Data layouts
  • Performance model - CPU vs. GPU
  • Bandwidth reduction

Example Problem:  A backprojection algorithm for CT reconstruction

  • The algorithm
  • Naïve analysis
  • Detailed analysis and performance model  
  • Optimizations

Energy & Parallel Scalability

  • Energy consumption of modern processors
  • The energy-to-solution metric
  • Performance engineering == power engineering
  • Case studies


Between each module, there is time for Questions and Answers!




Dr. Georg Hager (RRZE) and Dr.-Ing. Jan Treibig (RRZE)  (HPC, Uni. Erlangen)


via online registration form.

Please book Course 2014-NLP, "ALL DAYS" if you want to book all parts.


Academic participants (i.e., members of universities or public research institutions) from Europe or PRACE countries: Please apply through the PATC web page. After your registration, you will receive an automated "congratulation"-email about your successful registration. This email implies that you have a guaranteed seat in the course and you should organize your travel.
All other participants (not from academia, or from outside Europe), please apply through this online registration form.
Course number is 2014-NLP.


for registration is June 15, 2014.


Members of German universities and public research institutes: none.
Members of universities and public research institutes within Europe or PRACE: none.
Members of other universities and public research institutes: 120 EUR.
Others: 400 EUR.
(includes food and drink at coffee breaks, will be collected on the first day of the course, cash only)


Participants must have basic knowledge in programming with Fortran or C<//span>


HLRS is part of the Gauss Centre for Supercomputing (GCS), which is one of the six PRACE Advanced Training Centres (PATCs) that started in Feb. 2012. The mandate for the PATCs is as follows: "The PRACE Advanced Training Centres will serve as European hubs of advanced, world-class training for researchers working in the computational sciences." (see D3.2.3)
This course is a PATC course, see also the PRACE Training Portal and Events. For participants from public research institutions in PRACE countries, the course fee is sponsored through the PRACE PATC program.

HLRS is also member of the Baden-Württemberg initiative bwHPC-C5.
This course is also provided within the framework of the bwHPC-C5 user Support.

Travel-Info Stuttgart

See HLRS address and travel-info. The next public transport stations are: "Universität, Stuttgart" (S-Bahn station, 15 min on foot) and "Lauchhau, Stuttgart" (Bus station, 4 min on foot to HLRS, bus lines 84, 92, 746, 747, 748, but not 82! from S-Bahn station "Universität, Stuttgart" and bus line 81 from S-Bahn station "Stuttgart-Vaihingen").
Accomodation: see HLRS accomodation-info. Private Bed&Breakfast is also available (might be cheaper than the hotels), e.g., A DJH youth hostel is also available.
Further links: Online-Stadtplan des Stadtmessungsamtes Stuttgart or

Local Organizer

Rolf Rabenseifner phone 0711 685 65530,
Joerg Hertzer
phone 0711 685 65932, hertzer[at]

Cancelation Policy

If you cannot come to the course, please send an email to the organizer as soon as possible. This would allow us to accept additional participants from the waiting-list. There is no cancelation fee.
NO-SHOW: Registered persons that do not cancel and do not show up without any reasons are blocked for the next year on any of our workshops (because it is too expensive to produce unused copies of the slides for them).

Shortcut-URL of this course:


Rolf Rabenseifner phone 0711 685 65530, rabenseifner[at]
Joerg Hertzer phone 0711 685 65932, hertzer[at]