Multi-GPU Deep Learning

Photo of scientists participating in a training course in HLRS's Ruehle Saal

NVIDIA Deep Learning Institute (DLI) offers hands-on training for developers, data scientists, and researchers looking to solve challenging problems with deep learning.

Learn how to train and deploy a neural network to solve real-world problems, and how to effectively parallelize training of deep neural networks on Multi-GPUs.

The workshop combines the DLI courses Fundamentals of Deep Learning with the Deep Learning for multi-GPUs courses Data Parallelism: How To Train Deep Learning Models on Multiple GPUs and Model Parallelism: Building and Deploying Large Neural Networks. The lectures are interleaved with many hands-on sessions using Jupyter Notebooks.

This course is organized in cooperation with LRZ (Germany). The instructor is an NVIDIA certified University Ambassador.


HLRS, University of Stuttgart
Nobelstraße 19
70569 Stuttgart, Germany
Room 0.439 / Rühle Saal
Location and nearby accommodations


02. Jul 2024


05. Jul 2024






Daten in HPC / Deep Learning / Maschinelles Lernen


Künstliche Intelligenz

Big Data

Deep Learning


Maschinelles Lernen

Scientific Machine Learning

Zurück zur Liste

Prerequisites and content levels


For day one, you need basic experience with C/C++ or Fortran. Suggested resources to satisfy prerequisites: the interactive tutorial, Familiarity with MPI is a plus.

On day two, you need an understanding of fundamental programming concepts in Python 3, such as functions, loops, dictionaries, and arrays; familiarity with Pandas data structures; and an understanding of how to compute a regression line.
Suggested resources to satisfy prerequisites: Python Beginner’s Guide. Familiarity with TensorFlow and Keras will be a plus as it will be used in the hands-on sessions. For those who did not use these before, you can find tutorials here:

Experience with Deep Learning using Python 3 and, in particular, gradient descent model training will be needed on day three and four. Further, expericen with PyTorch will be helpful, see for instance.

Please be aware that while the second day offers an introduction or recap of Deep Learning most of the topics in this course are rather advanced. If you are completely unfamiliar with Deep Learning, the learning curve might be steep on days three and four.

Content levels
  • Basic: 4 hours
  • Intermediate: 4.5 hours
  • Advanced: 15 hours

Learn more about course curricula and content levels.


Lecture and assistant trainers: PD Dr. Juan Durillo Barrionuevo (LRZ and NVIDIA University Ambassador), Tobias Haas, Khatuna Kakhiani, and Lorenzo Zanon (HLRS).

Learning outcomes

1st day: Introduction to multi-GPU programming

  • You will have gained basic knowledge about how multi-GPU programming works.

2nd day: Introduction to Deep Learning

  • Implement common deep learning workflows, such as image classification and object detection.
  • Experiment with data, training parameters, network structure, and other strategies to increase performance and capability.
  • Deploy your neural networks to start solving real-world problems.

3rd day: Data Parallelism: How to Train Deep Learning Models on Multiple GPUs

  • Understand how data parallel deep learning training is performed using multiple GPUs.
  • Achieve maximum throughput when training, for the best use of multiple GPUs.
  • Distribute training to multiple GPUs using Pytorch Distributed Data Parallel.
  • Understand and utilize algorithmic considerations specific to multi-GPU training performance and accuracy.

4th day: Model Parallelism: Building and Deploying Large Neural Networks

  • Train neural networks across multiple servers.
  • Use techniques such as activation checkpointing, gradient accumulation, and various forms of model parallelism to overcome the challenges associated with large-model memory footprint.
  • Capture and understand training performance characteristics to optimize model architecture.
  • Deploy very large multi-GPU models to production using NVIDIA Triton™ Inference Server.


- preliminary -

 1st day (Tue): Introduction to multi-GPU programming (13:00 - 17:00)

On the first day you will learn the basics of multi-GPU programming. This will give you a rough idea how Deep Learning can be implemented using multi-GPUs.

2nd day (Wed): Introduction to Deep Learning  (9:00 - 17:00)

Explore the fundamentals of deep learning by training neural networks and using results to improve performance and capabilities.

During this day, you’ll learn the basics of deep learning by training and deploying neural networks.

3rd day (Thu): Data Parallelism: How to Train Deep Learning Models on Multiple GPUs (9:00 - 17:00)

The computational requirements of deep neural networks used to enable AI applications like self-driving cars are enormous. A single training cycle can take weeks on a single GPU or even years for larger datasets like those used in self-driving car research. Using multiple GPUs for deep learning can significantly shorten the time required to train lots of data, making solving complex problems with deep learning feasible.

On the third day we will teach you how to use multiple GPUs to train neural networks.

4th day (Fri): Model Parallelism: Building and Deploying Large Neural Networks (9:00 - 17:00)

  • Introduction to Training of Large Models,
  • Model Parallelism: Advanced Topics,
  • Inference of Large Models.


The exercises will be carried out on cloud instances and on one of HLRS's clusters (on the first day).

HLRS concept for on-site courses

Besides the content of the training itself, an important aspect of this event is the scientific exchange among the participants. We try to facilitate such communication by

  • a social event on the evening of the first course day,
  • offering common coffee and lunch breaks and
  • working together in groups of two during the exercises.


Register via the button at the top of this page.
We encourage you to register to the waiting list if the course is full. Places might become available.

If you are not interested in all days, please select only those days in which you are interested while registering.

Registration closes on June 16, 2024.

Late registrations after that date are still possible according to the course capacity, possibly with reduced quality of service.

Important Information: After you are accepted, please create an account under

NVIDIA Deep Learning Institute:

The NVIDIA Deep Learning Institute delivers hands-on training for developers, data scientists, and engineers. The program is designed to help you get started with training, optimizing, and deploying neural networks to solve real-world problems across diverse industries such as self-driving cars, healthcare, online services, and robotics.


This course is open to academic participants only.

  • Students without master’s degree or equivalent: 0 EUR
  • PhD students or employees at a German university or public research institute: 0 EUR
  • PhD students or employees at a university or public research institute in an EU, EU-associated or PRACE country other than Germany: 0 EUR.
  • PhD students or employees at a university or public research institute outside of EU, EU-associated or PRACE countries: 0 EUR

Our course fees include coffee breaks (in classroom courses only).

For lists of EU and EU-associated countries, and PRACE countries have a look at the Horizon Europe and PRACE website.


Tobias Haas phone 0711 685 87223, tobias.haas(at)

HLRS Training Collaborations in HPC

HLRS is part of the Gauss Centre for Supercomputing (GCS), together with JSC in Jülich and LRZ in Garching near Munich. EuroCC@GCS is the German National Competence Centre (NCC) for High-Performance Computing. HLRS is also a member of the Baden-Württemberg initiative bwHPC.

This course is provided within the framework of the bwHPC training program.

Further courses

See the training overview and the Supercomputing Academy pages.

Ähnliche Trainingskurse

Alle Trainingskurse

Oktober 23 - 25, 2024


November 04 - Dezember 13, 2024


November 04 - 08, 2024