**Institute for Parallel and Distributed Systems, University of Stuttgart (Germany)**

Principal Investigator: Dirk Pflüger

The generation of clean, sustainable energy from plasma fusion reactors is currently limited by the presence of microinstabilities that arise during the fusion process, despite international efforts such as the ITER experiment, currently under construction in southern France. Numerical simulations are crucial to understand, predict, and control plasma turbulence with the help of large-scale computations. Due to the high dimensionality of the underlying equations, the fully resolved simulation of the numerical ITER is out of scope with classical discretization schemes, even for the next generation of exascale computers. With five research groups from mathematics, physics, and computer science, the SPPEXA project EXAHD has proposed to use a hierarchical discretization scheme, so-called Sparse Grids, to overcome the current computational limits (number of discretization points per dimension and memory requirements). This way, it will be possible to enable high-resolution simulations, to ensure scalability of the simulations to future exascale computers and beyond, and even to be able to cope with faults and failures which will be more frequent for the next generation of supercomputers.

Fusion energy has the potential to become an environmentally friendly and safe alternative source of energy for generations to come, and international efforts such as the ITER experiment, currently under construction in southern France, seek to confirm this. In magnetic fusion devices, deuterium and tritium are heated up to a temperature of approximately 100.000.000 K and are confined via a toroidal magnetic field, so that the nuclei can fuse, releasing significant amounts of energy in the process. Unfortunately, the unavoidable steep temperature and density gradients drive small-scale plasma turbulence which in turn leads to large heat and particle losses. In principle, this problem can be overcome by building bigger devices, but this would lead to unacceptable cost increases. Thus, it is the goal to understand, predict, and control plasma turbulence with the help of large-scale computations, using state-of-the-art codes like GENE.

However, the underlying equations are high-dimensional as they do not only depend on space and time, but also on local velocities. This problem is the so-called „curse of dimensionality“, the exponential growth of the effort with respect to the number of dimensions: If each of the problems' five dimension (but time) would require only 1000 grid points for its discretization, then the 5 dimensions would result in 1.000.000.000.000.000 discretization points. Therefore, the fully resolved simulation of the numerical ITER is out of scope even for the next generation of exascale computers.

In the project EXAHD within Germany's priority program Software for Exascale Computing (SPPEXA), five groups from the University of Stuttgart, the University of Bonn, the Technical University of Munich, the Max-Planck Institute for Plasma Physics, and the Max-Planck Computing Center have joined forces to tackle this challenge. They have proposed to use a hierarchical discretization scheme, so-called Sparse Grids, to overcome the current computational limits (number of discretization points per dimension and memory requirements). Rather than simulating a full high-resolution discretization of a fusion device, an approximation can be computed based on a clever combination of many simulations with lower resolutions. This decomposes the problem described above into roughly 1.800 problems with only up to hundreds of thousands of grid points. This way, it will be possible to enable the high-resolution simulations that are required to understand the driving processes that hinder the creation of future fusion devices.

The researchers have shown that their approach can solve several of the “exascale challenges” – computational challenges that every parallel code has to face. First, the underlying hierarchical scheme introduces a second, extra level of parallelism that breaks the need for global communication of classical discretization schemes, as the partial problems can be computed independently and in parallel (Fig. 1, left). Only a reduced global gather-scatter step is required every few time steps. Sophisticated communication schemes have been developed that optimize the remaining communication, and parallel experiments confirm good scalability of the algorithms up to the full Hazel Hen (Fig. 1, right).

A second major breakthrough has been achieved with respect to the exascale challenge of fault tolerance. Extrapolating failure rates of hardware on current supercomputers to the exascale age, the mean time between failures will be in the range of hours. Thus, large-scale simulations will have to be able to cope with frequent faults and failures. The researchers have been able to demonstrate that the hierarchical approach of Sparse Grids can be used to ensure algorithm-based fault tolerance (ABFT), Fig. 2. Without the need to regularly write checkpoints to disc or memory (which can be prohibitively expensive) and to restart with the last checkpoint in case of a fault, ABFT schemes can compensate for faults and continue to simulate. The researchers have shown that even a high percentage of lost partial solutions can be compensated for with only a slight loss in accuracy (Fig. 3).

The results of this project will help to enable and prepare higher-dimensional simulation problems for the exascale age. While the main demonstrator application is the simulation of hot fusion plasmas, the research results can reach far beyond plasma physics.

- Prof. Dr. Hans-Joachim Bungartz, Michael Obersteiner

Scientific Computing, Department of Informatics, TU München - Dr. Tilman Dannert, Rafael Lago

Max Planck Computing and Data Facility - Prof. Dr. Michael Griebel, Johannes Rentrop

Institute for Numerical Simulation, Universität Bonn - Prof. Dr. Frank Jenko Max-Planck

Institute for Plasma Physics - Prof. Dr. Dirk Pflüger, Theresa Pollinger

Institute for Parallel and Distributed Systems, Universität Stuttgart - External partner: Prof. Dr. Markus Hegland

Centre for Mathematics and its Applications, Australian National University Canberra

Web page of the EXAHD project: https://ipvs.informatik.uni-stuttgart.de/SGS/EXAHD/

- M. Griebel and P. Oswald. Stable splittings of Hilbert spaces of functions of infinitely many variables. In: Journal of Complexity, 41:126–151, 2017.
- M. Griebel and P. Oswald. Stochastic subspace correction in Hilbert space. Submitted to Constructive Approximation. 2017.
- M. Heene, A. P. Hinojosa, M. Obersteiner, H.-J. Bungartz, and D. Pflüger. EXAHD - An Exa-Scalable Two-Level Sparse Grid Approach for Higher- Dimensional Problems in Plasma Physics and Beyond. High Performance Computing in Science and Engineering'17, 2017.
- M. Obersteiner, A. P. Hinojosa, M. Heene, H.-J. Bungartz, and D. Pflüger. A Highly Scalable, Algorithm-based Fault-tolerant Solver for Gyrokinetic Plasma Simulations. In Proceedings of the 8th Workshop on Latest Advances in Scalable Algorithms for Large- Scale Systems, ScalA ’17, pages 2:1–2:8, New York, NY, USA, 2017. ACM.
- M. Heene, A. Parra Hinojosa, D. Pflüger, and H.-J. Bungartz. A Massively-Parallel, Fault- Tolerant Solver for High-Dimensional PDEs. In: Desprez F. et al. (eds) Euro-Par 2016: Parallel Processing Workshops. Euro-Par 2016. Lecture Notes in Computer Science, vol 10104, Springer 2017.
- M. Heene and D. Pflüger. Scalable algorithms for the solution of higher-dimensional PDEs. In: Bungartz HJ., Neumann P., Nagel W. (eds) Software for Exascale Computing - SPPEXA 2013-2015. Lecture Notes in Computational Science and Engineering, vol 113. Springer, 2016.
- A. Parra Hinojosa, B. Harding, M. Hegland, and H.-J. Bungartz. Handling Silent Data Corruption with the Sparse Grid Combination Technique. In: Bungartz HJ., Neumann P., Nagel W. (eds) Software for Exascale Computing - SPPEXA 2013-2015. Lecture Notes in Computational Science and Engineering, vol 113. Springer, 2016.
- M. Heene, and D. Pflüger. Efficient and scalable distributed-memory hierarchization algorithms for the sparse grid combination technique. In: Advances in Parallel Computing, vol. 27, Parallel Computing: On the Road to Exascale, IOS Press, 2016.
- P. Hupp, M. Heene, R. Jacob, and D. Pflüger. Global communication schemes for the numerical solution of high-dimensional PDEs. In Parallel Computing, 52, Elsevier, 2016.
- B. Peherstorfer, C. Kowitz, D. Pflüger, and H.-J. Bungartz. Selected recent applications of sparse grids. Numerical Mathematics: Theory, Methods and Applications, 8(01), pp.47-77, 2015.
- M. Griebel, A. Hullmann, and P. Oswald. Optimal scaling parameters for sparse grid discretizations. Numerical Linear Algebra with Applications, 22(1):76 - 100, 2015.
- D. Pflüger, H.-J. Bungartz, M. Griebel, F. Jenko, T. Dannert, M. Heene, A. Parra Hinojosa, C. Kowitz, and P. Zaspel. EXAHD: An exa-scalable two-level sparse grid approach for higher-dimensional problems in plasma physics and beyond. EuroPar 2014, 2014.
- M. Griebel and P. Oswald. Schwarz Iterative Methods: Infinite Space Splittings. Constructive Approximation, pp.1-19, 2014.
- H. Doerk and F. Jenko. Towards optimal explicit time-stepping schemes for the gyrokinetic equations. Computer Physics Communications, 185(7):1938 - 1946, 2014.
- M. Heene, C. Kowitz, and D. Pflüger. Load balancing for massively parallel computations with the sparse grid combination technique. In Parallel Computing: Accelerating Computational Science and Engineering (CSE), Volume 25 of Advances in Parallel Computing, p. 574 - 583. IOS Press, Amsterdam, March 2014.
- P. Hupp, R. Jacob, M. Heene, D. Pflüger, and M. Hegland. Global communication schemes for the sparse grid combination technique. In Parallel Computing: Accelerating Computational Science and Engineering (CSE), Volume 25 of Advances in Parallel Computing, p. 564 - 573. IOS Press, Amsterdam, March 2014.
- T. Dannert, A. Marek, and M. Rampp. Porting large HPC applications to GPU clusters: The codes gene and vertex. CoRR, abs/1310.1485, 2013.
- C. Kowitz and M. Hegland. The sparse grid combination technique for computing eigenvalues in linear gyrokinetics. Procedia Computer Science, 18(0):449 - 458, 2013.
- C. Kowitz, D. Pflüger, F. Jenko, and M. Hegland. The combination technique for the initial value problem in linear gyrokinetics. In Sparse Grids and Applications, volume 88 of Lecture Notes in Computational Science and Engineering, pages 205 - 222, Heidelberg, October 2012. Springer.

Prof. Dr. Dirk Pflüger

Institute for Parallel and Distributed Systems

University of Stuttgart (Germany)

Universitätsstraße 38, D-70569 Stuttgart (Germany)

e-mail: Dirk.Pfluegeripvs.uni-stuttgart.de

http://www.ipvs.uni-stuttgart.de/abteilungen/sse

June 2018