AI software
On this page you find a collection of information about software for data analytics and especially machine learning, which we support at the MPCDF.
Table of Contents
Introduction
The current best practice for using AI and data analytics software on our systems is to utilize containers. Containers provide a consistent and reliable way to manage complex dependencies and ensure reproducibility across different environments. We recommend using Apptainer (formerly Singularity) for this purpose, as it is well-suited for our HPC infrastructure.
The software provided through environment modules is considered deprecated and should be used with caution. For more details on the recent changes in our Python infrastructure, please refer to the Bits & Bytes article.
Please note that users are expected to install any required software themselves. The following sections provide guidance and resources to help you get started with containers, installing Python packages locally, and best practices for setting up your environment.
Important
Always verify the performance of your software setup, as it can vary depending on the installation method. You can monitor performance using our monitoring system.
Containers
AI frameworks often come with complex dependencies that can vary across systems. To manage this effectively on MPCDF systems, containers are the recommended solution. Containers allow you to encapsulate your software environment into a single, portable image. This greatly simplifies reproducibility and collaboration.
Using containers on MPCDF Systems
On MPCDF systems, the recommended way to work with containers is through Apptainer (formerly Singularity). It integrates well with batch systems, supports GPU usage, and does not require root access, making it ideal for HPC environments.
Apptainer is available on our HPC systems via the module system. To see available versions, use the find-module
command. For example:
$ find-module apptainer
apptainer/1.3.2
apptainer/1.3.6
apptainer/1.4.1
For more details on the general usage of Apptainer, please refer to:
Dedicated Apptainer examples for AI frameworks
To help you get started with containers, we provide a curated AI Containers Repository on GitLab, featuring examples tailored for common AI frameworks like PyTorch and TensorFlow:
The repository includes:
Python scripts for typical AI workflows (e.g., training)
Apptainer definition files
Slurm job submission scripts for running containers on HPC systems
Best practices and tips for using containers
Use an apptainer image as a Jupyter kernel in RVS
To add a Jupyter kernel in RVS running inside an apptainer image follow the instructions outlined in the AI Containers Repository.
Hardware compatibility
To ensure optimal performance, it’s crucial to match your containers with the appropriate hardware, especially when using GPUs.
NVIDIA GPUs (e.g., on the Raven system):
Use containers built with CUDA and install AI frameworks compiled with CUDA support.
Browse available images here: NVIDIA NGC Catalog
AMD GPUs (e.g. on the Viper system):
Use containers built with ROCm, and ensure your AI frameworks are installed with ROCm support.
Browse available images here: AMD ROCm Docker Hub
How to install Python packages locally
For rapid experimentation, or if you want to leverage software already available on the HPC systems via environment modules, we recommend setting up a virtual environment to install any additional packages you may need.
Setting up a venv
First, load the Python intepreter via the Water Boa Python module:
module load python-waterboa/2024.06
Then load your required packages, if they are available on our module system. See our dedicated section for more information about how the module system works.
Now you create your virtual environment via:
python -m venv --system-site-packages <path/to/my_venv>
This command will create a directory at the given path where your software will be installed.
The --system-site-packages
flag gives the virtual environment access to the already loaded packages in the previous steps.
Activate your venv
To activate your newly created virtual environment, execute:
source <path/to/my_venv>/bin/activate
Install packages
Then you can simply install your required packages via pip
.
For example to install PyTorch:
pip install torch
Important
Take care of GPU support!
If you require a particular build for CUDA or ROCm, consult the documentation of the software you want to install. For example, to install pyTorch:
With NVIDIA GPUs support, install a wheel built with CUDA 12.6:
pip install torch --index-url https://download.pytorch.org/whl/cu126With AMD GPUs support, install a wheel built with ROCm 6.3:
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.3
Important
Conda environments have very limited support (more details in our documentation and this Bits and Bytes article).
Use a virtual environment as a Jupyter kernel in RVS
You can add a Jupyter kernel in RVS running in a virtual environment.
First install the ipykernel
package inside the virtual environment:
pip install ipykernel
Then install the kernel locally:
python -m ipykernel install --user --name=my-env-name --display-name "Python (my-env-name)"
It will be automatically visible in kernels list of a Jupyter Lab session in RVS.