Designing for Energy Efficiency in High-Performance Computing Centers

06 December 2017
Designing for Energy Efficiency in High-Performance Computing Centers

A two-day workshop at the High-Performance Computing Center Stuttgart (HLRS) brought together infrastructure experts from German supercomputing centers to discuss strategies for building more sustainable systems.

High-performance computing (HPC) has become an essential tool for investigating many kinds of problems in research and technology development. But the opportunities that it offers come at a cost. Operating a supercomputer can burn through the energy needed to power a small city, requires large cooling systems to prevent electronic equipment from overheating, and relies on literally tons of electronic hardware whose creation and disposal have sizable environmental impacts.

All of these facts also make HPC systems expensive to run, meaning their operators have a vested interest in designing them to be as efficient as possible. Currently, however, each computer center must individually find its own way to make its supercomputing more sustainable, not just in terms of its economic costs, but also importantly with respect to its environmental and social implications.

In an effort to promote discussion connecting these issues within this broader context of sustainability, HLRS organized and hosted its first Energy Efficiency Workshop for Sustainable High-Performance Computing on October 25–26, 2017. The event brought together representatives from supercomputing facilities in the Gauss Allianz, the Gauss Centre for Supercomputing (GCS)—HLRS, the Jülich Supercomputing Centre (JSC) and the Leibniz Supercomputing Centre (LRZ), Germany’s three largest HPC centers—and other academic institutions in Baden-Württemberg. In this way, the workshop facilitated dialogue among a broad cross-section of the German supercomputing community.

Measuring sustainability in supercomputing is difficult

One challenge in designing a sustainable supercomputer is that until recently there has been no fully satisfying definition of energy efficiency in HPC systems. One commonly used metric, called power use effectiveness (PUE), measures the relation between a computing center's total power consumption and the power consumption of its information technology. Under ideal circumstances, all power consumed would be dedicated to computation itself and other energy demands would be minimal.

Peter Radgen of the University of Stuttgart's Institut für Energiewirtschaft und Rationelle Energieanwendung (Institute for Energy Economics and Rational Energy Use, IER) and leader of the project Nachhaltige Rechenzentren Baden-Württemberg (Sustainable Computing Centers in Baden-Württemberg) pointed out, however, that even a perfect PUE score does not indicate energy efficiency. Because 70% of the power a supercomputing center ingests is converted into waste heat, capturing that heat and reusing it for other purposes—such as for heating nearby buildings—must be an important part of the equation. He also presented a range of other ideas for improving sustainability in supercomputing, including making better use of renewable energy sources, choosing more energy efficient IT equipment, and correctly scaling the size and configuration of supercomputers to the computational load that they must actually process.

Marina Köhn

Marina Köhn explained that new criteria under Der Blaue Engel für
Rechenzentren will help guide sustainable decision-making in HPC centers.
(Photo: Christopher Williams, HLRS)

Marina Köhn, a Green IT expert at the Umweltbundesamt (the German Federal Environmental Office), also pointed out that PUE is inadequate in defining an HPC center's environmental footprint, as it does not account for factors such as how efficiently a computer is utilized, differences in greenhouse gas emissions that depend on the energy source, or whether the cooling system is efficiently designed and operated. Her office has started a research project that is developing metrics to account for such aspects of HPC operation. In the future they will be implemented in certification under Der Blaue Engel für Rechenzentren (Blue Angel for Data Centers). "Der Blaue Engel has the advantage that we finally have something around which we can orient ourselves," she explained. Codifying requirements, she argued, should give planners concrete goals for designing HPC centers and guide their sustainable construction and operation in the future.

Following an introduction to HLRS's computing facility by Deputy Director for Infrastructure Norbert Conrad, energy specialist and sustainability team member Ursula Paul described HLRS's sustainability strategy. This includes plans for improving energy efficiency as well as other areas of sustainability, including ecological, economic, and social aspects. She presented some details about metrics that HLRS has been tracking and discussed plans to gain certification under the Eco-Management and Audit Scheme (EMAS) and ISO 50001, two demanding standards for environmental sustainability and energy management, respectively. HLRS is also currently internally discussing issues related to energy efficiency and waste heat reuse in preparation for expansions in the coming years.

Cooling is a key to improving energy efficiency

Because HPC is a niche industry with a handful of computer manufacturers, supercomputer operators don’t have a wide range of processors to choose from. For this reason, building more efficient infrastructure, especially cooling systems, offers a much more immediate opportunity to improve an HPC center's energy efficiency.

One approach used at the Steinbuch Centre for Computing (SCC) at the Karlsruhe Institute of Technology (KIT) involves warm water cooling, which regulates temperature for the majority of its HPC system. As Rudolf Lohner explained, circulating water enters the server room at a temperature of 42 °C and is heated to 47 °C by the time it leaves. Although the use of warm water for cooling might at first seem counterintuitive, the relatively high water temperature means that even in the warmer seasons air from outside the building can be used to cool the water after it passes through the computer room; that is, additional energy must not be used to refrigerate it. In the winter, the warm water circulates through pipes embedded in the core of the computing center building, radiating warmth through parts of the facility that need heating, including office space. The warm-water circuit is also connected to a cold-water system, which can be used when outdoor temperatures rise in the summertime.

Willi Homberg of the Jülich Supercomputing Centre (JSC) discussed how the cooling concept for its supercomputer has evolved over time, from freon fluid cooling in the 1980s to more recent hybrid cooling technologies. Such improvements not only have environmental benefits but have also been important to increasing economic sustainability. Recently, Homberg introduced a white paper for the Partnership for Advanced Computing in Europe (PRACE) that detailed ways to decrease total cost of ownership (TCO) for supercomputing systems. He also discussed his perspective on cooling concepts that are most efficient now, such as direct water cooling, as well as future technologies. One approach currently under investigation at JSC is immersion cooling, in which computer processors are completely submerged in a high-tech fluid, immediately capturing heat radiation and minimizing the amount of additional cooling that would be needed. The Jülich Research Center is also currently engaged in long-term planning for its campus development, and envisions capturing heat from its HPC center to heat surrounding buildings.

Daniel Hackenberg

Daniel Hackenberg and colleagues at ZIH have been planning for scalability.
(Photo: Christopher Williams, HLRS)

Daniel Hackenberg of the Center for Information Services and High-Performance Computing (ZIH) at TU Dresden presented a flexible architecture for HPC center design that he called the plenum concept. Here, IT components are completely separated from other infrastructure on two different building levels. In a special hot aisle containment system heat from the compute clusters above is sucked through air cooling units in the floor directly below them before circulating back up into the computer room. Hackenberg indicated that because the approach minimizes the volume of warm air that needs to be transported away from the processors it has proven to be very efficient. Although such a concept would not be possible to implement in an already existing HPC facility, he anticipates that the architecture should be scalable to larger machines as the TU Dresden facility grows. Hackenberg also spoke about ways in which his team has used sensors to optimize the cooling system.

For several years, the Leibniz Supercomputing Centre (LRZ) has operated an adsorption cooling system, the only one of its kind in a Top500 supercomputer. In his talk, Torsten Wilde discussed tests his team has undertaken to optimize energy efficiency in this system. Like Hackenberg, he also emphasized the opportunities that modern sensors offer for tracking system operation. In a current project, LRZ is cooperating with academic and industrial partners to develop software that uses machine learning to analyze data collected from sensors in ways that would make it possible predict and manage energy consumption in real time.

The future of the Energy Efficiency Workshop

Following the talks, the participants agreed that the workshop was a valuable forum for exchanging ideas within the German HPC community. HLRS and its partners will be discussing ways to build on the success of the first workshop in the coming years.

Christopher Williams