Biological Cybernetics


Name of the cluster:

ERIS

Institution:

Max Planck Institute for Biological Cybernetics

Login nodes:

  • eris01.bc.rzg.mpg.de

  • eris02.bc.rzg.mpg.de

Hardware Configuration:

Login nodes

Compute nodes ( 1792 CPU cores)

Compute nodes (256 CPU cores)

Compute nodes (128 CPU cores)

2 login nodes eris[01-02]

28 compute nodes eris[001-028]

4 compute nodes eris[101-104]

2 compute nodes erisg[001-002]

CPU

AMD EPYC 7452

CPU(s)

128

128

128

64

Thread(s) per core

2

2

2

2

Core(s) per socket

32

32

32

32

Socket(s)

2

2

2

1

RAM

250 GB

500 GB

1 TB

250 GB

GPU

4 x Quadro RTX 5000

  • Interconnect is based on 50Gb ethernet.

Filesystems:

GPFS-based with total size of 425 TB and independent inode space for the following filesets:

/u

shared home filesystem; GPFS-based; user quotas (currently default is 500 GB, 512K files) enforced quota can be checked with ‘/usr/lpp/mmfs/bin/mmlsquota’.

/scratch

shared scratch filesystem ; GPFS-based; no quotas enforced NO BACKUPS!

Compilers and Libraries:

The “module” subsystem is implemented on ERIS. Please use ‘module available’ to see all available modules.

  • Intel compilers (-> ‘module load intel’): icc, icpc, ifort

  • GNU compilers (-> ‘module load gcc’): gcc, g++, gfortran

  • Intel MKL (-> ‘module load mkl’): $MKL_HOME defined; libraries found in $MKL_HOME/lib/intel64

  • Intel MPI (-> ‘module load impi’): mpicc, mpigcc, mpiicc, mpiifort, mpiexec, … This module becomes visible and loadable only after a compiler module (Intel or GCC) has been loaded

  • Python (-> ‘module load anaconda’): python

Batch system based on Slurm:

The batch system on ERIS is the Slurm Workload Manager. A brief introduction into the basic commands (srun, sbatch, squeue, scancel, …) can be found on the Cobra home page. For more detailed information, see the Slurm handbook. See also the sample batch scripts which must be modified for ERIS cluster.

Current Slurm configuration on ERIS:

  • default turnaround time: 7 days

  • two partitions: p.eris and s.eris (default)

  • p.eris partition: for parallel MPI or hybrid MPI/OpenMP jobs. Resources are exclusively allocated on nodes. Max. nodes per job is 28

  • s.eris partition: for serial or OpenMP jobs. Nodes are shared. Jobs are limited to use CPUs only on one node. Default RAM per job is 128 GB

Useful tips:

By default run time limit used for jobs that don’t specify a value is 24 hours. Use --time option for sbatch/srun to set a limit on the total run time of the job allocation but not longer than 24 hours

By default jobs use all memory on nodes in p.eris partition.

In s.eris partition default allocated memory per job is 128 GB. To grant the job access to use more or less memory on each node use --mem or --mem-per-cpu options for sbatch/srun

The OpenMP codes require a variable OMP_NUM_THREADS to be set. This can be obtained from the Slurm environment variable $SLURM_CPUS_PER_TASK which is set when --cpus-per-task is specified in a sbatch script (an example is on help information page)

To run code on nodes with different memory capacity (500GB; 1TB) use --constraint=<list> option in a sbatch script: --constraint=mem500G or --constraint=mem1T

To use GPUs add in your slurm scripts --gres option and choose how many GPUs and/or which model of them to have: #SBATCH --gres=gpu:rtx_5000:1 or #SBATCH --gres=gpu:1

Valid gres options are: gpu[[:type]:count]
where
type is a type of gpu (rtx_5000)
count is a number of resources ( between 1 and 4)

GPU cards are in default compute mode.

To check node features use sinfo -O nodelist,features:30

Support:

For support please create a trouble ticket at the MPCDF helpdesk