Node-level Performance Engineering
Big seminar room, HLRS (Höchstleistungsrechenzentrum Stuttgart), Universität Stuttgart, Allmandring 30, D-70569 Stuttgart
2014, Monday, July 14, 9:00 - Tuesday, July 15, 17:00
This course teaches performance engineering approaches on the compute node level. "Performance engineering" as we define it is more than employing tools to identify hotspots and bottlenecks. It is about developing a thorough understanding of the interactions between software and hardware. This process must start at the core, socket, and node level, where the code gets executed that does the actual computational work. Once the architectural requirements of a code are understood and correlated with performance measurements, the potential benefit of optimizations can often be predicted. We introduce a "holistic" node-level performance engineering strategy, apply it to different algorithms from computational science, and also show how an awareness of the performance features of an application may lead to notable reductions in power consumption.
This course provides scientific training in Computational Science, and in addition, the scientific exchange of the participants among themselves.
- Intel and AMD x86 architectures
- Performance modeling & engineering approaches
- Our Approach
Practical performance analysis
- The LIKWID tools
- Typical performance patterns
Microbenchmarks and the memory hierarchy
- Understanding the memory hierarchy
- Data transfer between memory levels
- Write allocate vs. NT stores
- Modeling of cache hierarchies
- NUMA effects - anisotropy and asymmetry
Typical node-level software overheads
- Cost of synchronization
- Work Distribution
Example Problem: The 3D Jacobi solver
- Core-level optimizations
- Non Temporal stores
- SIMD vectorization (SSE, AVX)
- Multithreading - contention at different memory hierarchies
- Temporal Blocking
Example Problem: The Lattice-Boltzmann Method (LBM)
- Roofline Model
- Data layout
- Non Temporal stores
- Model for in-cache data & multicore scaling
- Sparse representation and options for Propagation
Example Problem: Sparse Matrix-Vector Multiplication
- Data layouts
- Performance model - CPU vs. GPU
- Bandwidth reduction
Example Problem: A backprojection algorithm for CT reconstruction
- The algorithm
- Naïve analysis
- Detailed analysis and performance model
Energy & Parallel Scalability
- Energy consumption of modern processors
- The energy-to-solution metric
- Performance engineering == power engineering
- Case studies
Between each module, there is time for Questions and Answers!
Academic participants (i.e., members of universities or public research institutions) from Europe or PRACE countries: Please apply through the PATC web page. After your registration, you will receive an automated "congratulation"-email about your successful registration. This email implies that you have a guaranteed seat in the course and you should organize your travel.
All other participants (not from academia, or from outside Europe), please apply through this online registration form.
Course number is 2014-NLP.
for registration is June 15, 2014.
Members of German universities and public research institutes: none.
Members of universities and public research institutes within Europe or PRACE: none.
Members of other universities and public research institutes: 120 EUR.
Others: 400 EUR.
(includes food and drink at coffee breaks, will be collected on the first day of the course, cash only)
Participants must have basic knowledge in programming with Fortran or C<//span>
PRACE PATC and bwHPC-C5
HLRS is part of the Gauss Centre for Supercomputing (GCS), which is one of the six PRACE Advanced Training Centres (PATCs) that started in Feb. 2012. The mandate for the PATCs is as follows: "The PRACE Advanced Training Centres will serve as European hubs of advanced, world-class training for researchers working in the computational sciences." (see D3.2.3)
This course is a PATC course, see also the PRACE Training Portal and Events. For participants from public research institutions in PRACE countries, the course fee is sponsored through the PRACE PATC program.
See HLRS address and travel-info. The next public transport stations are: "Universität, Stuttgart" (S-Bahn station, 15 min on foot) and "Lauchhau, Stuttgart" (Bus station, 4 min on foot to HLRS, bus lines 84, 92, 746, 747, 748, but not 82! from S-Bahn station "Universität, Stuttgart" and bus line 81 from S-Bahn station "Stuttgart-Vaihingen").
Accomodation: see HLRS accomodation-info and additional hotel list. Private Bed&Breakfast is also available (might be cheaper than the hotels), e.g., www.nd-bed-breakfast.de. A DJH youth hostel is also available.
Further links: Online-Stadtplan des Stadtmessungsamtes Stuttgart or www.city-map.de.
If you cannot come to the course, please send an email to the organizer as soon as possible. This would allow us to accept additional participants from the waiting-list. There is no cancelation fee.
NO-SHOW: Registered persons that do not cancel and do not show up without any reasons are blocked for the next year on any of our workshops (because it is too expensive to produce unused copies of the slides for them).