Brain Research
- Name of the clusters:
GABA
- Institution:
Max Planck Institute for Brain Research
Login nodes:
|
|
|
Hardware Configuration:
Login node gaba :
CPU model : Intel(R) Xeon(R) CPU E5-2643 0 @ 3.30GHz
2 sockets per node; 4 cores per socket; no hyper-threading
RAM : 64 GB
GPUs : 0
Login nodes gaba01 & gaba02:
CPUs Model : AMD Opteron(TM) Processor 6220 @ 3.0GHz
2 sockets per node; 4 cores per socket; hyper-threading enabled (2 threads per core)
RAM : 256 GB
GPUs : 0
8 execution nodes gaba[004-012] for parallel GPU computing (160 CPU cores in total):
CPUs Model : Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz
2 sockets per node; 10 cores per socket; hyper-threading is off
RAM : 256 GB
GPUs : 2 x Nvidia Tesla K40m GPUs per node
12 execution nodes gaba[013-024] for parallel CPU computing (240 CPU cores in total):
CPUs Model : Intel(R) Xeon(R) CPU E5-2640 v4 @ 2.40GHz
2 sockets per node; 10 cores per socket; hyper-threading is off
RAM : 256 GB
GPUs : 0
60 execution nodes gaba[101-160] for parallel CPU computing (1920 CPU cores) :
CPUs Model : Intel(R) Xeon(R) Gold 6130 CPU @ 2.10GHz
2 sockets per node; 16 cores per socket; hyper-threading is off
RAM : 384 GB
GPUs : 0
6 execution nodes gaba[201-206] for parallel CPU computing (288 CPU cores) :
CPUs Model : Intel(R) Xeon(R) Platinum 8268 CPU @ 2.90GHz
2 sockets per node; 24 cores per socket; hyper-threading is off
RAM : 755 GB
GPUs : 0
4 execution/login nodes gabag[01-04] for parallel GPGPU computing (128 CPU cores) :
CPUs Model : Intel(R) Xeon(R) Gold 6130 CPU @ 2.10GHz
2 sockets per node; 16 cores per socket; hyper-threading is off
RAM : 1TB
GPUs : 2 x Nvidia Tesla V100-PCIE-16GB GPUs per node
Node interconnect is based on 10 Gb/s etherent
Filesystems:
- /u
shared home filesystem (344 TB) with user home directory in
/u/<username>
; GPFS-based; no user quotas enforced- /gabaghi
filesystem for archive (458 TB); GPFS-based; no quotas enforced.
- /tmpscratch
shared filesystem filesystem (687 TB); GPFS-based; no quotas enforced. NO BACKUPS!
- /wKlive
filesystem for live wKcubes project (226 TB); GPFS-based; no quotas enforced. NO BACKUPS!
Compilers and Libraries:
The “module” subsystem is implemented on GABA. Please use ‘module available’ to see all available modules.
Intel compilers (-> ‘module load intel’): icc, icpc, ifort
GNU compilers (-> ‘module load gcc’): gcc, g++, gfortran
Intel MKL (‘module load mkl’): $MKL_HOME defined; libraries found in $MKL_HOME/lib/intel64
Intel MPI 2017.4 (‘module load impi’): mpicc, mpigcc, mpiicc, mpiifort, mpiexec, …
MATLAB (-> ‘module load matlab’): matlab, mcc, mex
CUDA (-> ‘module load cuda’)
Python (-> ‘module load anaconda/3/5.1.0’): python, ipython
Newer versions of some modules, as well as some additional modules, are accesible via the following command:
source /mpcdf/soft/distribution/obs_modules.sh
Executing this command also makes find-module
tool available, with which you can search for a certain module (as described here).
Batch system based on Slurm:
login nodes: gaba; batch nodes: gaba[003-024,101-160],gabag[01-04]
sbatch, srun, squeue, sinfo, scancel, scontrol, s*
current default turnaround time (wallclock) is 24 hours and max. turnaround time is 100 hours
sample batch scripts can be found on Cobra home page (must be modified for GABA)
Useful tips (slurm part of cluster):
By default run time limit used for jobs that don’t specify a value is 24 hours. Use --time option for sbatch/srun to set a limit on the total run time of the job allocation but not longer than 100 hours
Default memory per node is 50G. To grant the job access to all of the memory on each node use --mem=0 option for sbatch/srun
The OpenMP codes require a variable OMP_NUM_THREADS to be set. This can be obtained from the Slurm environment variable $SLURM_CPUS_PER_TASK which is set when --cpus-per-task is specified in a sbatch script (an example is on help information page)
To run code on nodes with different memory capacity (mem256G; mem768G; mem384G; mem1TB) use --constraint option in a sbatch script: #SBATCH --constraint=mem256G or #SBATCH --constraint=mem384G
To run code on nodes with specific CPU architecture (broadwell, ivybridge, skylake, cascadelake) use --constraint option in a sbatch script: --constraint=broadwell or --constraint=skylake
To choose nodes without GPUs use --gres=gpu:none:0 or just --gres=none
GPU cards are in default compute mode
Nodes have scheduling weight to allocate jobs starting with the lowest values by default.
To check node features, general resources and scheduling weight of nodes use sinfo -O nodelist,features,gres,weight
Support:
For support please create a trouble ticket at the MPCDF helpdesk