Dais User Guide


Note

The filesystem details, incl. the quota values, are not settled, yet, and thus subject to change.

Name of the cluster:

  • DAIS

Institution:

  • Selected MPG Departments

How to get Access permissions

Access can only be granted to members of institutes who procurred the system.

If you do not already have an account at MPCDF fill out the registration form. If you do already have an account at MPCDF but you cannot access to DAIS, please request access via our ticket system.

Note that access to DAIS can only be granted after successfully passing the export control which is done by the respective export control officer of your institute. The export control officer will be automatically notified as soon as you request an account for DAIS.

Access

Login

For security reasons, direct login to the HPC system DAIS is allowed only from within some MPG networks. Users from other locations have to login to one of our gateway systems first.

Login nodes

  • dais11.mpcdf.mpg.de, dais12.mpcdf.mpg.de

Dais’ ssh key fingerprints (SHA256) are:

ijGSRMd1K3bq14gUaKnI0rODsx5hgCVvtAzQoHC/sy0 (RSA)
Ke44kG2tm/IRqYFg9iUGapSFCLQKIiSUERez5eSsT9Y (ED25519)

Hardware Configuration

2 login nodes dais[11-12]:

  • 2 x INTEL(R) XEON(R) PLATINUM 8568Y+ 48-Core Processor @ 2.3 GHz

  • 96 cores per node

  • hyper-threading enabled - 2 threads per core

  • 500 GB RAM

17 execution nodes daisg[101-117]:

  • 2 x INTEL(R) XEON(R) PLATINUM 8568Y+ 48-Core Processor @ 2.3 GHz

  • 96 cores per node

  • hyper-threading enabled - 2 threads per core

  • 2.0 TB RAM

  • 8 x NVIDIA H200 GPUs (with 141GB HBM each) per node

10 execution nodes daisg[201-210]:

  • 2 x INTEL(R) XEON(R) PLATINUM 8568Y+ 48-Core Processor @ 2.3 GHz

  • 96 cores per node

  • hyper-threading enabled - 2 threads per core

  • 2.0 TB RAM

  • 8 x NVIDIA B200 GPUs (with 180GB HBM each) per node

2 execution nodes daisg[301-302]:

  • 2 x AMD EPYC 9555 64-Core Processor @ 3.2 GHz

  • 128 cores per node

  • hyper-threading enabled - 2 threads per core

  • 1.5 TB RAM

  • 4 x NVIDIA RTX PRO 6000 GPUs (with 96GB HBM each) per node

Node interconnect:

  • based on Mellanox Technologies InfiniBand fabric (Speed: 8*200Gb per GPU Node)

Filesystems

Filesystem /u

  • shared home filesystem

  • quoted to 500k of files and 1TB of data

Filesystem /dais/fs/scratch

  • shared scratch filesystem, 200TB

  • quoted to 8M of files and 8TB of data

  • NO BACKUPS

Filesytem /viper/ptmp2

The file system /viper/ptmp2 is designed for batch job I/O (12 PB, no system backups). Files in /viper/ptmp2 that have not been accessed for more than 12 weeks will be removed automatically. The period of 12 weeks may be reduced if necessary (with prior notice). As a current policy, no quotas are applied on /viper/ptmp2. This gives users the freedom to manage their data according to their actual needs without administrative overhead. This liberal policy presumes a fair usage of the common file space. So, please do a regular housekeeping of your data and archive/remove files that are not used actually.

Filesystem /nexus/posix0

Additional storage space can be rented

Compilers and Libraries

Hierarchical environment modules are used at MPCDF to provide software packages and enable switching between different software versions. Users have to specify the needed modules with explicit versions at login and during the startup of a batch job. Not all software modules are displayed immediately by the module avail command, for some user first needs to load a compiler and/or MPI module. You can search the full hierarchy of the installed software modules with the find-module command.

Batch system based on Slurm

The batch system on DAIS is the Slurm Workload Manager. A brief introduction into the basic commands (srun, sbatch, squeue, scancel, …) can be found on the Raven home page. For more detailed information, see the Slurm handbook. For example batch scripts see below.

Current Slurm configuration on DAIS

  • default turnaround time: 2 hours

  • current max. turnaround time (wallclock): 24 hours

  • gpu partition: exclusive usage of compute nodes (with 8 GPUs each); default

  • gpu1 partition: shared usage of compute nodes; for jobs with less than 4 GPUs

Useful tips

By default run time limit used for jobs that don’t specify a value is 2 hours. Use --time option for sbatch/srun to set a limit on the total run time of the job allocation, but not longer than 24 hours.

Default memory per node in the shared partition is 250000 MB, maximum per allocated node per job is 2000000 MB. To grant the job access to all of the memory on each node use --mem=0 option for sbatch/srun. On nodes with RTX PRO 6000 gpus default memory per job is 375000 MB and nodes cannot be used exclusively.

The OpenMP codes require a variable OMP_NUM_THREADS to be set. This can be obtained from the Slurm environment variable $SLURM_CPUS_PER_TASK which is set when --cpus-per-task is specified in a sbatch script (an example is on help information page)

To use GPUs add in your slurm scripts --gres option and choose how many GPUs and/or which model of them to have: #SBATCH --gres=gpu:GPUTYPE:X, where X is a number of resources from 1 up to 8, and GPUTYPE is one of h200, b200, rtx_pro_6000. If no gpu type is specified, the job will be scheduled to arbitrary nodes.

Only one rtx_pro_6000 gpu is allowed per job.

GPU cards are in default compute mode.

Slurm example batch scripts

The following sections present some general examples of submission scripts for the DAIS system. For more examples with specific frameworks and containerized setups, or comparisons with other HPCS systems, refer to the ai_containers repository.

Single-GPU job on a shared node

The following example launches a Python program on one GPU (NVIDIA H200, B200, or RTX PRO 6000) on a shared node.

#!/bin/bash -l
#
# Initial working directory:
#SBATCH -D ./
#
# Standard output and error:
#SBATCH -o ./job.out.%j
#SBATCH -e ./job.err.%j
#
# Job name
#SBATCH -J test_1gpu
#
# Time limit
#SBATCH --time=0-00:10:00 # wall-clock D-HH:MM:SS (here: 10 minutes)
#
#SBATCH --nodes=1  # request 1 node.
#SBATCH --partition="gpu1" # request a shared node.
#SBATCH --ntasks-per-node=1 # request 1 task on that node.
#
# --- default case: use a single H200 on a shared node ---
#SBATCH --gres=gpu:h200:1 # use 1 H200.
#SBATCH --cpus-per-task=12 # request 1/8 of available CPUs on a H200 node.
#SBATCH --mem=250000 # grant the job access to 1/8 of the memory on a H200 node.
#
# --- uncomment to use a single B200 on a shared node ---
# #SBATCH --gres=gpu:b200:1 # use 1 B200
# #SBATCH --cpus-per-task=12 # request 1/8 of available CPUs on a B200 node.
# #SBATCH --mem=250000 # grant the job access to 1/8 of the memory on a B200 node
#
# --- uncomment to use a single RTX PRO 6000 on a shared node ---
# #SBATCH --gres=gpu:rtx_pro_6000:1 # use 1 RTX PRO 6000
# #SBATCH --cpus-per-task=32 # request 1/4 of available CPUs on a RTX node.
# #SBATCH --mem=375000 # grant the job access to 1/4 of the memory on a RTX node.
#
#SBATCH --mail-type=none
#SBATCH --mail-user=userid@example.mpg.de

###### Environment ######
module purge
module load apptainer/1.4.3
CONTAINER="YOUR_CONTAINER"

###### Run the program:
srun apptainer exec --nv $CONTAINER python3 ./your_python_executable

Multi-GPU job on a shared node

The script below launches a distributed Python program in a single-node, multi-GPUs setting.

The example illustrates a distributed PyTorch workflow. Note that different frameworks might requires different SLURM settings (e.g. 1 task per node instead of 1 task per GPU). For additional examples refer to the ai_containers repository.

#!/bin/bash -l
#
# Initial working directory:
#SBATCH -D ./
#
# Standard output and error:
#SBATCH -o ./job.out.%j
#SBATCH -e ./job.err.%j
#
# Job name
#SBATCH -J test_gpu
#
# Time limit
#SBATCH --time=0-00:10:00 # wall-clock D-HH:MM:SS (here: 10 minutes)
#
# Number of nodes, GPUs and MPI tasks per node:
#SBATCH --nodes=1  # request 1 node.
#SBATCH --partition="gpu1" # request a shared node.
#
# --- use 2 GPUs on a shared node ---
#SBATCH --gres=gpu:h200:2 # use 2 GPU on a shared node.
#SBATCH --ntasks-per-node=2 # request 2 tasks on that node (1 per gpu).
#SBATCH --cpus-per-task=12 # request 1/8 of available CPUs on the node *per task*.
#SBATCH --mem=500000 # grant the job access to 2/8 of the memory on the node.
#
# --- uncomment to use 3 GPUs on a shared node ---
# #SBATCH --gres=gpu:h200:3 # use 3 GPU on a shared node.
# #SBATCH --ntasks-per-node=3 # request 3 tasks on that node (1 per gpu).
# #SBATCH --cpus-per-task=12 # request 1/8 of available CPUs on the node *per task*.
# #SBATCH --mem=750000 # grant the job access to 3/8 of the memory on the node.
#
# --- uncomment to use 4 GPUs on a shared node ---
# #SBATCH --gres=gpu:h200:4 # use 4 GPU on a shared node.
# #SBATCH --ntasks-per-node=4 # request 4 tasks on that node (1 per gpu).
# #SBATCH --cpus-per-task=12 # request 1/8 of available CPUs on the node *per task*.
# #SBATCH --mem=1000000 # grant the job access to 4/8 of the memory on the node.
#
#SBATCH --mail-type=none
#SBATCH --mail-user=userid@example.mpg.de

###### Environment ######
module purge
module load apptainer/1.4.3
CONTAINER="YOUR_PYTORCH_CONTAINER"

###### PyTorch distributed variables ######
export MASTER_ADDR=$(scontrol show hostnames "$SLURM_JOB_NODELIST" | head -n 1)
export APPTAINERENV_MASTER_PORT=$(expr 10000 + $(echo -n $SLURM_JOBID | tail -c 4))
export WORLD_SIZE=$(($SLURM_NNODES * $SLURM_NTASKS_PER_NODE))

###### Run the program:
srun apptainer exec --nv $CONTAINER \
  bash -c "RANK=\${SLURM_PROCID} python3 ./your_python_executable"

B200: B200 and H200 nodes share the same CPU layout. So B200 GPUs can be requested by only replacing h200 by b200 in the --gres line in the above script:

# --- B200 version ---
#SBATCH --gres=gpu:b200:x  # x is the number of GPUs you need

Note

  • Requesting more than half of a node’s resources (for example, > 4 GPUs) triggers a full‑node allocation, meaning the node is reserved exclusively for the job.

  • The RTX PRO 6000 nodes do not allow multi-GPU jobs.

Multi-node job

The script below runs a Python program on 16 GPUs spread over two nodes, illustrating a distributed PyTorch workflow (additional examples are available in the ai_containers repository).

#!/bin/bash -l
#
# Initial working directory:
#SBATCH -D ./
#
# Standard output and error:
#SBATCH -o ./job.out.%j
#SBATCH -e ./job.err.%j
#
# Job name
#SBATCH -J test_gpu
#
# Time limit
#SBATCH --time=0-00:10:00 # wall-clock D-HH:MM:SS (here: 10 minutes)
#
# Number of nodes, GPUs and MPI tasks per node:
#SBATCH --nodes=2  # request 2 or more full nodes
#SBATCH --partition="gpu" # request an exclusive node
#
#SBATCH --gres=gpu:h200:8 # use 8 H200 on each node.
# #SBATCH --gres=gpu:b200:8 # use 8 B200 on each node.
#SBATCH --ntasks-per-node=8 # request 8 tasks on each node (1 per gpu).
#SBATCH --cpus-per-task=12 # request 1/8 of available CPUs per task 
#SBATCH --mem=0 # grant the job access to all of the memory on each node 
#
#SBATCH --mail-type=none
#SBATCH --mail-user=userid@example.mpg.de

###### Environment ######
module purge
module load apptainer/1.4.3
CONTAINER="YOUR_PYTORCH_CONTAINER"

###### PyTorch distributed variables ######
export MASTER_ADDR=$(scontrol show hostnames "$SLURM_JOB_NODELIST" | head -n 1)
export APPTAINERENV_MASTER_PORT=$(expr 10000 + $(echo -n $SLURM_JOBID | tail -c 4))
export WORLD_SIZE=$(($SLURM_NNODES * $SLURM_NTASKS_PER_NODE))

###### Run the program:
srun apptainer exec --nv $CONTAINER \
  bash -c "RANK=\${SLURM_PROCID} python3 ./your_python_executable"

Support

For support please create a trouble ticket at the MPCDF helpdesk