Biophysics CRYO


Name of the cluster:

CRYO

Institution:

Max Planck Institute of Biophysics

Login nodes:

  • cryo101.bc.rzg.mpg.de

  • cryo102.bc.rzg.mpg.de

Hardware Configuration:

Login node cryo101 & cryo102 :

  • CPUs Model: Intel(R) Xeon(R) Silver 4110 CPU @ 2.10GHz

  • 2 sockets per node

  • 8 cores per socket

  • hyper-threading is on (2 threads per core)

  • 188 GB RAM

  • 2 x GeForce GTX 1080 Ti

38 execution nodes cryo[103-140] :

  • CPUs Model: Intel(R) Xeon(R) Silver 4110 CPU @ 2.10GHz

  • 2 sockets per node

  • 8 cores per socket

  • hyper-threading is on (2 threads per core)

  • 188 GB RAM

  • 2 x GeForce GTX 1080 Ti

20 execution nodes cryo[201-220]

  • CPUs Model: Intel(R) Xeon(R) Gold 6138 CPU @ 2.00GHz

  • 2 sockets per node

  • 20 cores per socket

  • hyper-threading is on (2 threads per core)

  • 755 GB RAM

  • 2 x GeForce GTX 1080 Ti

2 execution nodes cryog[801-802]

  • CPUs Model: Intel(R) Xeon(R) Gold 6230R CPU @ 2.10GHz

  • 2 sockets per node

  • 26 cores per socket

  • hyper-threading is off (1 threads per core)

  • 188 GB RAM

  • 4 x Quadro RTX 8000

12 execution nodes cryo[901-912]

  • CPUs Model: Intel(R) Xeon(R) Platinum 8268 CPU @ 2.90GHz

  • 2 sockets per node

  • 24 cores per socket

  • hyper-threading is off (1 threads per core)

  • 188 GB RAM

  • node interconnect is based on Mellanox Technologies InfiniBand fabric (Speed: nodes cryo[101-140,201-220] - 56Gb and nodes cryo[901-912],cryog[801-802] - 100Gb )

Filesystems:

GPFS-based with total size of 1496 TB:

/u

shared home filesystem (214 TB) with user home directory in /u/<username>; GPFS-based; user quotas (currently 4 TB, 2M files) enforced; quota can be checked with ‘/usr/lpp/mmfs/bin/mmlsquota’.

/sbdata

shared scratch filesystem (1.3 PB); no quotas enforced. NO BACKUPS!

Compilers and Libraries:

The “module” subsystem is implemented on CRYO. Please use ‘module available’ to see all available modules.

  • Intel compilers (-> ‘module load intel’): icc, icpc, ifort

  • GNU compilers (-> ‘module load gcc’): gcc, g++, gfortran

  • Intel MKL (- > ‘module load mkl’): $MKL_HOME defined; libraries found in $MKL_HOME/lib/intel64

  • Intel MPI 2018.3 (-> ‘module load impi’): mpicc, mpigcc, mpiicc, mpiifort, mpiexec, …´

  • MATLAB (-> ‘module load matlab’): matlab, mcc, mex

  • Mathematica (-> ‘module load mathematica’)

  • CUDA (-> ‘module load cuda’)

Batch system based on Slurm:

The batch system on CRYO is the Slurm Workload Manager. A brief introduction into the basic commands (srun, sbatch, squeue, scancel, …) can be found on the Cobra home page. For more detailed information, see the Slurm handbook. See also the sample batch scripts which must be modified for CRYO cluster.

Current Slurm configuration on CRYO:

  • default turnaround time: 24 hours

  • current max. turnaround time (wallclock): 4 days

  • p.cryo partition includes all nodes and is default

  • p.cpu1 partition which includes “Silver” CPUs has 12 cores per node

  • p.cpu2 partition includes only “Gold” CPUs with 36 cores per each node

  • nodes cryo[103-104,201-202] are configured for interactive X sessions

Useful tips:

By default run time limit used for jobs that don’t specify a value is 24 hours. Use --time option for sbatch/srun to set a limit on the total run time of the job allocation but not longer than 96 hours

Default memory per node is 50G. To grant the job access to all of the memory on each node use --mem=0 option for sbatch/srun

The OpenMP codes require a variable OMP_NUM_THREADS to be set. This can be obtained from the Slurm environment variable $SLURM_CPUS_PER_TASK which is set when --cpus-per-task is specified in a sbatch script (an example is on help information page)

To run code on nodes with different memory capacity (mem185G; mem750G) use --constraint option in a sbatch script: #SBATCH --constraint=mem185G or #SBATCH --constraint=mem750G

To run code on nodes with either Gold or Silver SkyLake CPUs which have different amount of cores (core40; core16) use --constraint option in a sbatch script: #SBATCH --constraint=core40 or #SBATCH --constraint=core16. To use Platinum CascadeLake CPUs add in sbatch scripts constraint cascadelake or core48

To use GPUs add in your slurm scripts --qos=gpu and --gres options and choose how many GPUs and/or which model of them to have: #SBATCH --gres=gpu:gtx1080:1 or #SBATCH --gres=gpu:gtx1080:2

Valid gres options are: gpu[[:type]:count]
where
type is a type of gpu (gtx1080 or rtx8000)
count is a number of resources (>=0)

GPU cards are in default compute mode.

To check node features, general resources and scheduling weight of nodes use sinfo -O nodelist,features,gres,weight

Support:

For support please create a trouble ticket at the MPCDF helpdesk