Brain Research


Name of the clusters:

GABA

Institution:

Max Planck Institute for Brain Research

Login nodes:

  • gaba.opt.rzg.mpg.de

  • gaba01.opt.rzg.mpg.de

  • gaba02.opt.rzg.mpg.de

Hardware Configuration:

Login node gaba :

  • CPU model : Intel(R) Xeon(R) CPU E5-2643 0 @ 3.30GHz

  • 2 sockets per node; 4 cores per socket; no hyper-threading

  • RAM : 64 GB

  • GPUs : 0

Login nodes gaba01 & gaba02:

  • CPUs Model : AMD Opteron(TM) Processor 6220 @ 3.0GHz

  • 2 sockets per node; 4 cores per socket; hyper-threading enabled (2 threads per core)

  • RAM : 256 GB

  • GPUs : 0

8 execution nodes gaba[004-012] for parallel GPU computing (160 CPU cores in total):

  • CPUs Model : Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz

  • 2 sockets per node; 10 cores per socket; hyper-threading is off

  • RAM : 256 GB

  • GPUs : 2 x Nvidia Tesla K40m GPUs per node

12 execution nodes gaba[013-024] for parallel CPU computing (240 CPU cores in total):

  • CPUs Model : Intel(R) Xeon(R) CPU E5-2640 v4 @ 2.40GHz

  • 2 sockets per node; 10 cores per socket; hyper-threading is off

  • RAM : 256 GB

  • GPUs : 0

60 execution nodes gaba[101-160] for parallel CPU computing (1920 CPU cores) :

  • CPUs Model : Intel(R) Xeon(R) Gold 6130 CPU @ 2.10GHz

  • 2 sockets per node; 16 cores per socket; hyper-threading is off

  • RAM : 384 GB

  • GPUs : 0

6 execution nodes gaba[201-206] for parallel CPU computing (288 CPU cores) :

  • CPUs Model : Intel(R) Xeon(R) Platinum 8268 CPU @ 2.90GHz

  • 2 sockets per node; 24 cores per socket; hyper-threading is off

  • RAM : 755 GB

  • GPUs : 0

4 execution/login nodes gabag[01-04] for parallel GPGPU computing (128 CPU cores) :

  • CPUs Model : Intel(R) Xeon(R) Gold 6130 CPU @ 2.10GHz

  • 2 sockets per node; 16 cores per socket; hyper-threading is off

  • RAM : 1TB

  • GPUs : 2 x Nvidia Tesla V100-PCIE-16GB GPUs per node

Node interconnect is based on 10 Gb/s etherent

Filesystems:

/u

shared home filesystem (344 TB) with user home directory in /u/<username>; GPFS-based; no user quotas enforced

/gabaghi

filesystem for archive (458 TB); GPFS-based; no quotas enforced.

/tmpscratch

shared filesystem filesystem (687 TB); GPFS-based; no quotas enforced. NO BACKUPS!

/wKlive

filesystem for live wKcubes project (226 TB); GPFS-based; no quotas enforced. NO BACKUPS!

Compilers and Libraries:

The “module” subsystem is implemented on GABA. Please use ‘module available’ to see all available modules.

  • Intel compilers (-> ‘module load intel’): icc, icpc, ifort

  • GNU compilers (-> ‘module load gcc’): gcc, g++, gfortran

  • Intel MKL (‘module load mkl’): $MKL_HOME defined; libraries found in $MKL_HOME/lib/intel64

  • Intel MPI 2017.4 (‘module load impi’): mpicc, mpigcc, mpiicc, mpiifort, mpiexec, …

  • MATLAB (-> ‘module load matlab’): matlab, mcc, mex

  • CUDA (-> ‘module load cuda’)

  • Python (-> ‘module load anaconda/3/5.1.0’): python, ipython

Newer versions of some modules, as well as some additional modules, are accesible via the following command: source /mpcdf/soft/distribution/obs_modules.sh Executing this command also makes find-module tool available, with which you can search for a certain module (as described here).

Batch system based on Slurm:

  • login nodes: gaba; batch nodes: gaba[003-024,101-160],gabag[01-04]

  • sbatch, srun, squeue, sinfo, scancel, scontrol, s*

  • current default turnaround time (wallclock) is 24 hours and max. turnaround time is 100 hours

  • sample batch scripts can be found on Cobra home page (must be modified for GABA)

Useful tips (slurm part of cluster):

By default run time limit used for jobs that don’t specify a value is 24 hours. Use --time option for sbatch/srun to set a limit on the total run time of the job allocation but not longer than 100 hours

Default memory per node is 50G. To grant the job access to all of the memory on each node use --mem=0 option for sbatch/srun

The OpenMP codes require a variable OMP_NUM_THREADS to be set. This can be obtained from the Slurm environment variable $SLURM_CPUS_PER_TASK which is set when --cpus-per-task is specified in a sbatch script (an example is on help information page)

To run code on nodes with different memory capacity (mem256G; mem768G; mem384G; mem1TB) use --constraint option in a sbatch script: #SBATCH --constraint=mem256G or #SBATCH --constraint=mem384G

To run code on nodes with specific CPU architecture (broadwell, ivybridge, skylake, cascadelake) use --constraint option in a sbatch script: --constraint=broadwell or --constraint=skylake

To use GPUs add in your slurm scripts --gres option and choose how many GPUs and/or which model of them to have: #SBATCH --gres=gpu:k40m:2
Valid gres options are: gpu[[:type]:count]
where
type is a type of gpu (k40m)
count is a number of resources (>=0)

To choose nodes without GPUs use --gres=gpu:none:0 or just --gres=none

GPU cards are in default compute mode

Nodes have scheduling weight to allocate jobs starting with the lowest values by default.

To check node features, general resources and scheduling weight of nodes use sinfo -O nodelist,features,gres,weight

For interactive jobs please use srun command:
srun --time=90:00:00 --constraint=skylake --pty bash -i -l
srun --time=1-10 --mem=32G --gres=gpu:1 --pty bash -i -l
keep in mind that --pty option should be the last srun option.

Support:

For support please create a trouble ticket at the MPCDF helpdesk