Biophysics CRYO
- Name of the cluster:
CRYO
- Institution:
Max Planck Institute of Biophysics
Login nodes:
|
|
Hardware Configuration:
Login node cryo101 & cryo102 :
CPUs Model: Intel(R) Xeon(R) Silver 4110 CPU @ 2.10GHz
2 sockets per node
8 cores per socket
hyper-threading is on (2 threads per core)
188 GB RAM
2 x GeForce GTX 1080 Ti
38 execution nodes cryo[103-140] :
CPUs Model: Intel(R) Xeon(R) Silver 4110 CPU @ 2.10GHz
2 sockets per node
8 cores per socket
hyper-threading is on (2 threads per core)
188 GB RAM
2 x GeForce GTX 1080 Ti
20 execution nodes cryo[201-220]
CPUs Model: Intel(R) Xeon(R) Gold 6138 CPU @ 2.00GHz
2 sockets per node
20 cores per socket
hyper-threading is on (2 threads per core)
755 GB RAM
2 x GeForce GTX 1080 Ti
2 execution nodes cryog[801-802]
CPUs Model: Intel(R) Xeon(R) Gold 6230R CPU @ 2.10GHz
2 sockets per node
26 cores per socket
hyper-threading is off (1 threads per core)
188 GB RAM
4 x Quadro RTX 8000
12 execution nodes cryo[901-912]
CPUs Model: Intel(R) Xeon(R) Platinum 8268 CPU @ 2.90GHz
2 sockets per node
24 cores per socket
hyper-threading is off (1 threads per core)
188 GB RAM
node interconnect is based on Mellanox Technologies InfiniBand fabric (Speed: nodes cryo[101-140,201-220] - 56Gb and nodes cryo[901-912],cryog[801-802] - 100Gb )
Filesystems:
GPFS-based with total size of 1496 TB:
- /u
shared home filesystem (214 TB) with user home directory in
/u/<username>
; GPFS-based; user quotas (currently 4 TB, 2M files) enforced; quota can be checked with ‘/usr/lpp/mmfs/bin/mmlsquota’.- /sbdata
shared scratch filesystem (1.3 PB); no quotas enforced. NO BACKUPS!
Compilers and Libraries:
The “module” subsystem is implemented on CRYO. Please use ‘module available’ to see all available modules.
Intel compilers (-> ‘module load intel’): icc, icpc, ifort
GNU compilers (-> ‘module load gcc’): gcc, g++, gfortran
Intel MKL (- > ‘module load mkl’): $MKL_HOME defined; libraries found in $MKL_HOME/lib/intel64
Intel MPI 2018.3 (-> ‘module load impi’): mpicc, mpigcc, mpiicc, mpiifort, mpiexec, …´
MATLAB (-> ‘module load matlab’): matlab, mcc, mex
Mathematica (-> ‘module load mathematica’)
CUDA (-> ‘module load cuda’)
Batch system based on Slurm:
The batch system on CRYO is the Slurm Workload Manager. A brief introduction into the basic commands (srun, sbatch, squeue, scancel, …) can be found on the Cobra home page. For more detailed information, see the Slurm handbook. See also the sample batch scripts which must be modified for CRYO cluster.
Current Slurm configuration on CRYO:
default turnaround time: 24 hours
current max. turnaround time (wallclock): 4 days
p.cryo partition includes all nodes and is default
p.cpu1 partition which includes “Silver” CPUs has 12 cores per node
p.cpu2 partition includes only “Gold” CPUs with 36 cores per each node
nodes cryo[103-104,201-202] are configured for interactive X sessions
Useful tips:
By default run time limit used for jobs that don’t specify a value is 24 hours. Use --time option for sbatch/srun to set a limit on the total run time of the job allocation but not longer than 96 hours
Default memory per node is 50G. To grant the job access to all of the memory on each node use --mem=0 option for sbatch/srun
The OpenMP codes require a variable OMP_NUM_THREADS to be set. This can be obtained from the Slurm environment variable $SLURM_CPUS_PER_TASK which is set when --cpus-per-task is specified in a sbatch script (an example is on help information page)
To run code on nodes with different memory capacity (mem185G; mem750G) use --constraint option in a sbatch script: #SBATCH --constraint=mem185G or #SBATCH --constraint=mem750G
To run code on nodes with either Gold or Silver SkyLake CPUs which have different amount of cores (core40; core16) use --constraint option in a sbatch script: #SBATCH --constraint=core40 or #SBATCH --constraint=core16. To use Platinum CascadeLake CPUs add in sbatch scripts constraint cascadelake or core48
To use GPUs add in your slurm scripts --qos=gpu and --gres options and choose how many GPUs and/or which model of them to have: #SBATCH --gres=gpu:gtx1080:1 or #SBATCH --gres=gpu:gtx1080:2
GPU cards are in default compute mode.
To check node features, general resources and scheduling weight of nodes use sinfo -O nodelist,features,gres,weight
Support:
For support please create a trouble ticket at the MPCDF helpdesk