A paper presented this week at the Cray User Group Technical Meeting in Jersey City, NJ, USA, offers the first rigorous benchmark study of the AMD Instinct MI300A accelerated processing unit (APU) for artificial intelligence (AI) applications. For high-performance computing (HPC) centers trying to decide whether to invest in a hybrid, APU-based system or in a separate, more specialized system optimized for machine learning and generative AI, the paper demonstrates that the MI300A is capable of achieving high performance in the training of large language models (LLMs). The publication also offers practical insights that system architects, AI practitioners, and HPC centers can use to leverage the full capabilities of the APU processor for deep learning and AI applications.
May 09, 2025
Home Top News
HLRS Projects
Systems & Infrastructure
AI & Data Analytics
Partnerships
See all news
Past benchmark studies have provided performance data of AMD’s MI250X and MI300X processors, but this is the first to generate MI300A performance data in an LLM use case.
The study resulted from a collaboration involving representatives of Seedbox.AI Lab, Hewlett Packard Enterprise (HPE), AMD, and the High-Performance Computing Center Stuttgart (HLRS). Utilizing a data compression approach being developed by Seedbox.AI, the team completed its experiments on HLRS’s new Hunter supercomputer. The collaboration between HLRS and Seedbox.AI is developing within the project HammerHAI, a EuroHPC Joint Undertaking (EuroHPC JU) “AI Factory” being coordinated by HLRS.
Manufactured by HPE, Hunter is based on the AMD MI300A APU, which combines CPUs, GPUs, and high-bandwidth memory in a single chip. When used for traditional HPC applications, the GPUs function as accelerators, enabling faster simulations using less energy. Because GPUs are also the go-to processors for deep learning and AI applications, however, the researchers set out to test how well the hybrid architecture could handle a big data workflow.
After compiling a 20 billion token multilingual dataset enriched with synthetic data, the team used Hunter to train a large language model for 24 European languages. Although the team initially encountered several challenges in setting up and executing the pipeline on the new hardware, the paper explains strategies they identified for overcoming its limitations. One challenge in particular involved dealing with constraints resulting from the MI300A’s memory architecture. Unlike on AI-dedicated, GPU-only systems, the APU utilizes a shared memory for both CPU and GPU components. LLM training initially strained this architecture, as multiple elements in the training pipeline had to compete for available RAM. By implementing steps to optimize memory usage, however, the team was able to scale performance almost linearly on up to 64 nodes (256 APUs).
The research also benefited from a data compression approach being developed at Seedbox.AI called SimplePrune, which reduces the complexity of a large language model by intelligently eliminating redundant neural pathways in training data. Such an approach has the benefit of reducing the energy and time necessary for LLM training. Using a machine learning approach called knowledge distillation (KD) on the resulting dataset, the researchers found that the pruned models both resulted in a parameter reduction of 80% and delivered virtually identical results in comparison to their unpruned counterparts.
Together, the successful training of an LLM model on Hunter and viability of the SimplePrune methodology reveal potential strategies for making large language models more accessible for AI users. Rather than simply investing in oversized AI-optimized computing systems to run ever larger models, the results suggest that a stronger focus on model optimization in future research could lead to more efficient, flexible, lower cost HPC architectures that are also suitable for many typical AI applications.
Dennis Dickmann, Chief Technology Officer at Seedbox.AI, led the experiment and sees additional advantages for the European AI community. “Using Hunter at HLRS, we showed that it's possible to train high-quality, optimized LLMs with structured sparsity and advanced KD, without relying on non-European cloud infrastructure. This paper is not just about weights and biases. It's a demonstration that the infrastructure to build advanced AI systems exists in Europe — and it works.“
The paper presented to the Cray User Group is an early result of HammerHAI, a EuroHPC JU AI Factory being coordinated by HLRS that officially launched on April 1, 2025. At the same time that planning is underway for the installation of a new large-scale AI-optimized supercomputer at HLRS in 2026, HammerHAI has begun supporting startups like Seedbox.AI in developing, testing, and implementing new applications of artificial intelligence.
— Christopher Williams