On April 22, nearly 100 users of the High-Performance Computing Center Stuttgart's (HLRS's) computing systems participated in an online meeting organized to enhance communication with HLRS staff. Presentations addressed complications resulting from the ongoing COVID-19 pandemic, provided information about recent and upcoming technical developments, and gave users a chance to ask questions regarding HLRS's systems.
Following welcoming remarks by HLRS Director Michael Resch, system operations leader Thomas Beisel reported on the center's transition from its Hazel Hen to Hawk supercomputers. He explained the impact of pandemic-related difficulties during Hawk's installation, addressed technical questions that users have raised about the new system, and described upcoming changes that should improve the system's operations and the center's ability to serve its users.
Among the issues Beisel focused on were changes in the systems' technical architecture, the upcoming implementation of "fair share" scheduling procedures, new data storage policies, measures implemented to enhance security, and efforts to optimize power consumption. He also offered a preview of upcoming system maintenance.
Because the COVID-19 pandemic caused delays in the operation of the complete Hawk system, Prof. Resch reported that grants of computing time made to users by the Gauss Centre for Supercomputing will be extended by six months. (PRACE users must inquire regarding policies related to their grants.) He emphasized that annual reports should nevertheless be submitted on time according to the normal deadline, and advised users to report any effects of the delay in Hawk's operation on their research.
Dr. Thomas Bönisch followed by encouraging users to pay closer attention to the optimization of their codes for HLRS's high-performance computing systems, emphasizing the many benefits for both the center's operations and its system users' research. When users optimize their codes, he explained, HLRS's available computing time can support more scientific discovery for all, while individual users can get their results more quickly and even gain competitive advantages.
Because of the energy requirements of supercomputing, he added, performance optimization is also important for improving Hawk's environmental sustainability. "If you support the Fridays for Future movement to address climate change," Bönisch said, "optimizing software so that it runs as efficiently as possible is something concrete that scientists can do to help reduce our carbon footprint."
As Bönisch pointed out, getting code to perform better and scale efficiently to larger systems is particularly important because traditional advances in hardware — for example, by increasing clock frequency or adding more transistors to computer chips — are reaching their limits. The integration of accelerators or other technologies such as general purpose graphics processing units (GPGPUs), data flow systems, many-core systems, or vector processors could eventually improve computing speed somewhat, although Bönisch stressed that substantial efficiency gains will also require adapting software to such new processor architectures.
HLRS system users are not alone, however, in their efforts to improve their scientific codes' performance. As Bönisch explained, HLRS now has a large, dedicated team of user support experts with expertise in specific scientific domains and computer science who work closely with users in optimizing parallel performance, I/O performance, node level performance, and other key factors. This support takes place in the context of HLRS's semiannual optimization workshops and on an ongoing basis through interactions between HLRS users and user support staff. Describing several specific code optimization projects that HLRS staff supported, Bönisch remarked that HLRS has helped to optimize approximately 40 user codes in recent years, enabling performance increases of up to 1000%.
In the final section of the meeting, Thomas Beisel and Dennis Hoppe offered a preview of an upcoming expansion of Hawk through the addition of graphics processing units, which will support research involving artificial intelligence and deep learning. Beisel provided a technical overview of the new HPE Apollo 6500 system, explaining that it will be completely integrated with the Hawk system network, including its data storage and data management systems.
Hoppe anticipates that the seamless integration of GPUs into Hawk should offer users opportunities to more easily develop hybrid workflows that combine traditional high-performance computing with machine learning, deep learning, and data analytics. The new system will support commonly used open source frameworks such as Apache Spark, TensorFlow, and PyTorch.
Concluding the meeting, Resch reported on future next-generation developments in HLRS's supercomputing infrastructure that are currently in planning. He also indicated that a follow-up to this user information session will take place during the next Results & Review Workshop, scheduled for October 2021.