Psychiatry PSYCL

Name of the cluster:: PSYCL
Institution:: Max Planck Institute of Psychiatry

Hardware-Configuration:

Login node psycl01 :

CPUs Model: Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz
2 sockets
10 cores per socket
hyper-threading (2 threads per core)
128 GB RAM
2 x Tesla K20Xm

13 execution nodes psycl[02-14] :

CPUs Model: Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz
2 sockets
10 cores per socket
hyper-threading (2 threads per core)
128 GB RAM ( psycl[02-05] )
256 GB RAM ( psycl[06-09] )
512 GB RAM ( psycl[10-13] )
768 GB RAM ( psycl14 )
2 x Tesla K20Xm ( psycl[02-04] )
2 x GeForce GTX 980 ( psycl[05-14] )

node interconnect is based on 10 Gb/s ethernet

Filesystems:

GPFS-based with total size of 226 TB:

/u: shared home filesystem with user home directory in /u/<username>; GPFS-based; user quotas (currently 4 TB, 1M files) enforced; quota can be checked with ‘/usr/lpp/mmfs/bin/mmlsquota’.
/ptmp: shared scratch filesystem with user directory in /ptmp/<username>; GPFS-based; no quotas enforced. NO BACKUPS!

Compilers and Libraries:

The “module” subsystem is implemented on PSYCL. Please use ‘module available’ to see all available modules.

Python (-> ‘module load anaconda/3/5.1’): python, ipython
Intel compilers (-> ‘module load intel’): icc, icpc, ifort
GNU compilers (-> ‘module load gcc’): gcc, g++, gfortran
Intel MKL (- > ‘module load mkl’): $MKL_HOME defined; libraries found in $MKL_HOME/lib/intel64
Intel MPI 2018.3 (-> ‘module load impi’): mpicc, mpigcc, mpiicc, mpiifort, mpiexec, …´
MATLAB (-> ‘module load matlab’): matlab, mcc, mex
Mathematica (-> ‘module load mathematica’)
CUDA (-> ‘module load cuda’)

Batch system based on Slurm:

The batch system on PSYCL is the Slurm Workload Manager. A brief introduction into the basic commands (srun, sbatch, squeue, scancel, …) can be found on the Cobra home page. For more detailed information, see the Slurm handbook. See also the sample batch scripts which must be modified for PSYCL cluster.

Current Slurm configuration on PSYCL:

default turnaround time: 24 hours
current max. turnaround time (wallclock): 11 days

Useful tips:

By default run time limit used for jobs that don’t specify a value is 24 hours. Use --time option for sbatch/srun to set a limit on the total run time of the job allocation but not longer than 11 days

Default memory per node is 60G. To grant the job access to all of the memory on each node use –mem=0 option for sbatch/srun

The OpenMP codes require a variable OMP_NUM_THREADS to be set. This can be obtained from the Slurm environment variable $SLURM_CPUS_PER_TASK which is set when --cpus-per-task is specified in a sbatch script (an example is on help information page)

To run code on nodes with different memory capacity (mem128G; mem256G; mem512G; mem768G) use --constraint option in a sbatch script: #SBATCH --constraint=mem256G or #SBATCH --constraint=mem768G

To use GPUs add in your slurm scripts --gres option and choose
how many GPUs and/or which model of them to have: #SBATCH --gres=gpu:gtx980:1
or #SBATCH --gres=gpu:gtx980:2
Valid gres options are: gpu[[:type]:count]
where
type is a type of gpu (gtx980 or k20m)
count is a number of resources (>=0)

GPU cards are in default compute mode.

To check node features, general resources and scheduling weight of nodes use sinfo -O nodelist,features:25,gres,weight

Support:

For support please create a trouble ticket at the MPCDF helpdesk