HLRS High Performance Computing Center Stuttgart: First Benchmark Study of the AMD MI300A APU for Training Large Language Models

Past benchmark studies have provided performance data of AMD’s MI250X and MI300X processors, but this is the first to generate MI300A performance data in an LLM use case.

The study resulted from a collaboration involving representatives of Seedbox.AI Lab, Hewlett Packard Enterprise (HPE), AMD, and the High-Performance Computing Center Stuttgart (HLRS). Utilizing a data compression approach being developed by Seedbox.AI, the team completed its experiments on HLRS’s new Hunter supercomputer. The collaboration between HLRS and Seedbox.AI is developing within the project HammerHAI, a EuroHPC Joint Undertaking (EuroHPC JU) “AI Factory” being coordinated by HLRS.

Optimization of MI300A delivers high performance in LLM training

Manufactured by HPE, Hunter is based on the AMD MI300A APU, which combines CPUs, GPUs, and high-bandwidth memory in a single chip. When used for traditional HPC applications, the GPUs function as accelerators, enabling faster simulations using less energy. Because GPUs are also the go-to processors for deep learning and AI applications, however, the researchers set out to test how well the hybrid architecture could handle a big data workflow.

After compiling a 20 billion token multilingual dataset enriched with synthetic data, the team used Hunter to train a large language model for 24 European languages. Although the team initially encountered several challenges in setting up and executing the pipeline on the new hardware, the paper explains strategies they identified for overcoming its limitations. One challenge in particular involved dealing with constraints resulting from the MI300A’s memory architecture. Unlike on AI-dedicated, GPU-only systems, the APU utilizes a shared memory for both CPU and GPU components. LLM training initially strained this architecture, as multiple elements in the training pipeline had to compete for available RAM. By implementing steps to optimize memory usage, however, the team was able to scale performance almost linearly on up to 64 nodes (256 APUs).

Comparable performance of sparse models using SimplePrune

The research also benefited from a data compression approach being developed at Seedbox.AI called SimplePrune, which reduces the complexity of a large language model by intelligently eliminating redundant neural pathways in training data. Such an approach has the benefit of reducing the energy and time necessary for LLM training. Using a machine learning approach called knowledge distillation (KD) on the resulting dataset, the researchers found that the pruned models both resulted in a parameter reduction of 40% and delivered virtually identical results in comparison to their unpruned counterparts.

Together, the successful training of an LLM model on Hunter and viability of the SimplePrune methodology reveal potential strategies for making large language models more accessible for AI users. Rather than simply investing in oversized AI-optimized computing systems to run ever larger models, the results suggest that a stronger focus on model optimization in future research could lead to more efficient, flexible, lower cost HPC architectures that are also suitable for many typical AI applications.

Dennis Dickmann, Chief Technology Officer at Seedbox.AI, led the experiment and sees additional advantages for the European AI community. “Using Hunter at HLRS, we showed that it's possible to train high-quality, optimized LLMs with structured sparsity and advanced KD, without relying on non-European cloud infrastructure. This paper is not just about weights and biases. It's a demonstration that the infrastructure to build advanced AI systems exists in Europe — and it works.“

“AI Factory” HammerHAI supports new method testing

The paper presented to the Cray User Group is an early result of HammerHAI, a EuroHPC JU AI Factory being coordinated by HLRS that officially launched on April 1, 2025. At the same time that planning is underway for the installation of a new large-scale AI-optimized supercomputer at HLRS in 2026, HammerHAI has begun supporting startups like Seedbox.AI in developing, testing, and implementing new applications of artificial intelligence.

— Christopher Williams

Editor’s note: An earlier version of this article indicated a parameter reduction of 80% using pruned models. The final version of the Cray User Group paper identified a parameter reduction of 40%.

High-Performance Computing Center Stuttgart

First Benchmark Study of the AMD MI300A APU for Training Large Language Models

Research at HLRS conducted in the “AI Factory” HammerHAI has identified strategies for optimizing hybrid system architectures for model training and demonstrates the viability of sparse training models as an alternative to large models.

Optimization of MI300A delivers high performance in LLM training

Comparable performance of sparse models using SimplePrune

“AI Factory” HammerHAI supports new method testing