Institute for Parallel and Distributed Systems, University of Stuttgart (Germany)
Principal Investigator: Dirk Pflüger
The generation of clean, sustainable energy from plasma fusion reactors is currently limited by the presence of microinstabilities that arise during the fusion process, despite international efforts such as the ITER experiment, currently under construction in southern France. Numerical simulations are crucial to understand, predict, and control plasma turbulence with the help of large-scale computations. Due to the high dimensionality of the underlying equations, the fully resolved simulation of the numerical ITER is out of scope with classical discretization schemes, even for the next generation of exascale computers. With five research groups from mathematics, physics, and computer science, the SPPEXA project EXAHD has proposed to use a hierarchical discretization scheme, so-called Sparse Grids, to overcome the current computational limits (number of discretization points per dimension and memory requirements). This way, it will be possible to enable high-resolution simulations, to ensure scalability of the simulations to future exascale computers and beyond, and even to be able to cope with faults and failures which will be more frequent for the next generation of supercomputers.
Fusion energy has the potential to become an environmentally friendly and safe alternative source of energy for generations to come, and international efforts such as the ITER experiment, currently under construction in southern France, seek to confirm this. In magnetic fusion devices, deuterium and tritium are heated up to a temperature of approximately 100.000.000 K and are confined via a toroidal magnetic field, so that the nuclei can fuse, releasing significant amounts of energy in the process. Unfortunately, the unavoidable steep temperature and density gradients drive small-scale plasma turbulence which in turn leads to large heat and particle losses. In principle, this problem can be overcome by building bigger devices, but this would lead to unacceptable cost increases. Thus, it is the goal to understand, predict, and control plasma turbulence with the help of large-scale computations, using state-of-the-art codes like GENE.
However, the underlying equations are high-dimensional as they do not only depend on space and time, but also on local velocities. This problem is the so-called „curse of dimensionality“, the exponential growth of the effort with respect to the number of dimensions: If each of the problems' five dimension (but time) would require only 1000 grid points for its discretization, then the 5 dimensions would result in 1.000.000.000.000.000 discretization points. Therefore, the fully resolved simulation of the numerical ITER is out of scope even for the next generation of exascale computers.
In the project EXAHD within Germany's priority program Software for Exascale Computing (SPPEXA), five groups from the University of Stuttgart, the University of Bonn, the Technical University of Munich, the Max-Planck Institute for Plasma Physics, and the Max-Planck Computing Center have joined forces to tackle this challenge. They have proposed to use a hierarchical discretization scheme, so-called Sparse Grids, to overcome the current computational limits (number of discretization points per dimension and memory requirements). Rather than simulating a full high-resolution discretization of a fusion device, an approximation can be computed based on a clever combination of many simulations with lower resolutions. This decomposes the problem described above into roughly 1.800 problems with only up to hundreds of thousands of grid points. This way, it will be possible to enable the high-resolution simulations that are required to understand the driving processes that hinder the creation of future fusion devices.
The researchers have shown that their approach can solve several of the “exascale challenges” – computational challenges that every parallel code has to face. First, the underlying hierarchical scheme introduces a second, extra level of parallelism that breaks the need for global communication of classical discretization schemes, as the partial problems can be computed independently and in parallel (Fig. 1, left). Only a reduced global gather-scatter step is required every few time steps. Sophisticated communication schemes have been developed that optimize the remaining communication, and parallel experiments confirm good scalability of the algorithms up to the full Hazel Hen (Fig. 1, right).
A second major breakthrough has been achieved with respect to the exascale challenge of fault tolerance. Extrapolating failure rates of hardware on current supercomputers to the exascale age, the mean time between failures will be in the range of hours. Thus, large-scale simulations will have to be able to cope with frequent faults and failures. The researchers have been able to demonstrate that the hierarchical approach of Sparse Grids can be used to ensure algorithm-based fault tolerance (ABFT), Fig. 2. Without the need to regularly write checkpoints to disc or memory (which can be prohibitively expensive) and to restart with the last checkpoint in case of a fault, ABFT schemes can compensate for faults and continue to simulate. The researchers have shown that even a high percentage of lost partial solutions can be compensated for with only a slight loss in accuracy (Fig. 3).
The results of this project will help to enable and prepare higher-dimensional simulation problems for the exascale age. While the main demonstrator application is the simulation of hot fusion plasmas, the research results can reach far beyond plasma physics.
Web page of the EXAHD project: https://ipvs.informatik.uni-stuttgart.de/SGS/EXAHD/