ONLINE COURSE: Deep Learning and Acceleration with OpenACC on Nvidia GPUs

Research & Science Enterprises & SME
ONLINE COURSE: Deep Learning and Acceleration with OpenACC on Nvidia GPUs


This course will be provided as ONLINE course (using Newrow).

NVIDIA Deep Learning Institute (DLI) offers hands-on training for developers, data scientists, and researchers looking to solve challenging problems with deep learning.

Learn how to train and deploy a neural network to solve real-world problems, how to generate effective descriptions of content within images and how to accelerate your applications with OpenACC.

The workshop combines lectures on Fundamentals of Deep Learning and  Fundamentals of Deep Learning for Multi-GPUs with a lecture on Accelerated Computing with OpenACC.

The lectures are interleaved with many hands-on sessions using Jupyter Notebooks. The exercises will be done on a fully configured GPU-accelerated workstation in the cloud.

The workshop is organized in cooperation Nvidia.

On the last day, you will learn more about details of Nvidia's GPU architecture and how to use containers for DL on the systems at HLRS.


Day 1 - Fundamentals of Deep Learning (9-17:30, Gunter Roth)

In this workshop, you’ll learn how deep learning works through hands-on exercises in computer vision and natural language processing. You’ll train deep learning models from scratch, learning tools and tricks to achieve highly accurate results. You’ll also learn to leverage freely available, state-of-the-art pre-trained models to save time and get your deep learning application up and running quickly.

Gained Skills

  • Learn the fundamental techniques and tools required to train a deep learning model
  • Gain experience with common deep learning data types and model architectures
  • Enhance datasets through data augmentation to improve model accuracy
  • Leverage transfer learning between models to achieve efficient results with less data and computation

Required Prerequisites

An understanding of fundamental programming concepts in Python 3, such as functions, loops, dictionaries, and arrays; familiarity with Pandas data structures; and an understanding of how to compute a regression line.
Suggested resources to satisfy prerequisites: Python Beginner’s Guide.

Day 2 - Deep Learning for Multi-GPUs (9-17:30, Gunter Roth)

This workshop teaches you techniques for training deep neural networks on multi-GPU technology to shorten the training time required for data-intensive applications. Working with deep learning tools, frameworks, and workflows to perform neural network training, you’ll learn concepts for implementing Horovod multi-GPUs to reduce the complexity of writing efficient distributed software and to maintain accuracy when training a model across many GPUs.

Gained Skills

  • Stochastic gradient descent (SGD), a crucial tool in parallelized training
  • Batch size and its effect on training time and accuracy
  • Transforming a single-GPU implementation to a Horovod multi-GPU implementation
  • Techniques for maintaining high accuracy when training across multiple GPUs

Required Prerequisites

Experience with Deep Learning using Python 3 and, in particular, gradient descent model training.

Day 3 - Fundamentals of Accelerated Computing with OpenACC (9-17:30, Jonny Hancox)

Learn the basics of OpenACC, a high-level programming language for programming on GPUs. This course is for anyone with some C/C++ experience who is interested in accelerating the performance of their applications beyond the limits of CPU-only programming.

Gained Skills

  • How to profile and optimize your CPU-only applications to identify hot spots for acceleration.
  • How to use OpenACC directives to GPU accelerate your codebase.
  • How to optimize data movement between the CPU and GPU accelerator.
  • Upon completion, you'll be ready to use OpenACC to GPU accelerate CPU-only applications.

Required Prerequisites

Basic experience with C/C++.
Suggested resources to satisfy prerequisites: the interactive tutorial,

Day 4 - Fundamentals of GPU Computing and Transfer to HLRS HPC system (9-13:00, Dai Yang)

In this workshop details of (Nvidia's) GPU architecture and fundamentals of GPU programming will be introduced. Further, in the second part, an introduction on how to use unprivileged container solutions, e.g., Singularity, in HPC environments and methods to use existing Docker containers from Nvidia's GPU Cloud in HPC environments will be given.

Gained Skills

  • Fundamentals about Nvidia's GPU architecture.
  • Fundamentals of GPU programming.
  • How to use unprivileged container solutions in HPC environments.
  • How to use existing Docker containers from Nvidia's GPU Cloud in HLRS's HPC system.

Required Prerequisites

If you want to test containers in HLRS HPC systems, you need an account.


The course language is English.


Jonny Hancox, Gunter Roth and Dai Yang from Nvidia.


Please register only for the days that you will attend. I.e., if you want to participate in the course "Fundamentals of Deep Learning" only, please only register for Day 1 and so on.

If the course is full, then please register to the waiting list, so that we can inform you about available places as soon as possible.

Please choose the matching registration queue

This course is over. Therefore, registration is closed.
To get early information about future courses, you may register to our e-mail list also via the online registration form.


This course is free of charge.



HLRS is part of the Gauss Centre for Supercomputing (GCS), which is one of the six PRACE Advanced Training Centres (PATCs) that started in Feb. 2012.
HLRS is also member of the Baden-Württemberg initiative bwHPC-C5.
This course is provided within the framework of the bwHPC-C5 user Support.
This course is not part of the PATC curriculum and is not sponsored by the PATC program.


Tobias Haas phone 0711 685 87223,
Khatuna Kakhiani phone 0711 685 65796,

Shortcut-URL & Course Number