Astrophysics FREYA


Name of the cluster:

FREYA

Institution:

Max Planck Institute for Astrophysics

Login nodes:

  • freya01.bc.rzg.mpg.de

  • freya02.bc.rzg.mpg.de

  • freya03.bc.rzg.mpg.de

  • freya04.bc.rzg.mpg.de

Allowed only for selected users:

  • virgo.bc.rzg.mpg.de

Hardware Configuration:

  • login nodes freya[01-04] : 2 x Intel(R) Xeon(R) Gold 6138 CPU @ 2.00GHz; 40 cores per node; 384 GB RAM

  • virgo: 4 x Intel(R) Xeon(R) CPU E5-4610 v2 @ 2.30GHz; 32 cores with 2 threads per core; 1 TB RAM

  • 100 execution nodes freya[073-104,109-176] for parallel computing with a total amount of 6880 CPU cores; 2 x Intel(R) Xeon(R) Gold 6138 CPU @ 2.00GHz; 192 GB RAM

  • 4 execution nodes freya[104-108] for parallel computing with a total amount of 160 CPU cores; 2 x Intel(R) Xeon(R) Gold 6138 CPU @ 2.00GHz; 384 GB RAM

  • 8 execution nodes freyag[01-08] for parallel GPU computing with a total amount of 320 CPU cores; 2 x Intel(R) Xeon(R) Gold 6138 CPU @ 2.00GHz; 384 GB RAM; 2 x Nvidia Tesla P100-PCIE-16GB GPUs per node

  • 4 execution nodes freyag[09-12] for parallel GPU computing with a total amount of 160 CPU cores; 2 x Intel(R) Xeon(R) Gold 6138 CPU @ 2.00GHz; 384 GB RAM; 2 x Nvidia Tesla V100-PCIE-32GB GPUs per node

  • 11 execution nodes freyag[201-211] for parallel GPU computing with a total amount of 480 CPU cores; 2 x Intel(R) Xeon(R) Platinum 8268 CPU @ 2.90GHz; 384 GB RAM; 4 x Nvidia Tesla A100-PCIE-40GB GPUs per node

  • node interconnect is based on Intel Omni-Path Fabric (Speed: 100Gb/s)

Filesystems:

/u

shared home filesystem; GPFS-based; user quotas (currently 900 GB, 1M files) enforced; quota can be checked with ‘/usr/lpp/mmfs/bin/mmlsquota’.

/freya/ptmp

shared scratch filesystem (1.7 PB); GPFS-based; no quotas enforced. NO BACKUPS!

/virgo

shared scratch filesystem (3.9 PB); GPFS-based; no quotas enforced. NO BACKUPS

Compilers and Libraries:

The “module” subsystem is implemented on FREYA. Please use ‘module available’ to see all available modules.

  • Intel compilers (-> ‘module load intel’): icc, icpc, ifort

  • GNU compilers (-> ‘module load gcc’): gcc, g++, gfortran

  • Intel MKL (‘module load mkl’): $MKL_HOME defined; libraries found in $MKL_HOME/lib/intel64

  • Intel MPI 2017.4 (‘module load impi’): mpicc, mpigcc, mpiicc, mpiifort, mpiexec, …´

  • GPGPU computing (‘module load cuda’): nvcc, …

Similar to the HPC systems, this module tree is hierarchical.
To find a module and information about the available versions or what dependencies need to be loaded first one can use the ‘find-module’ command.

Batch system based on Slurm:

  • sbatch, srun, squeue, sinfo, scancel, scontrol, s*

  • current max. turnaround time (wallclock): 24 hours

  • max. nodes limit per user: 92

  • four partitions: p.24h (default), p.test, p.gpu & p.gpu.ampere

  • p.test partition: has 4 nodes with 2 Nvidia Pascal gpus and 30 min run time

  • sample batch scripts can be found on Cobra home page (must be modified for FREYA)

Useful tips:

Nodes in p.test partition are in shared mode, default memory per job set to 9500 MB. To allocate necessary amount of memory use --mem parameter.

Nvidia Pascal and Volta GPUs are available in p.gpu partition. To use them add in your slurm scripts #SBATCH -p p.gpu and choose how many GPUs to have #SBATCH --gres=gpu:1 or #SBATCH --gres=gpu:2 To use Volta or Pascal GPUs add type of GPUs into the --gres parameter: --gres=gpu:p100:1 or --gres=gpu:v100:2

Nodes in p.gpu partition are in exclusive mode i.e. jobs allocate entire nodes.

Nvidia Ampere GPUs are available in p.gpu.ampere partition. Type of gpu must be explicitly set, i.e. --gres=gpu:a100:X, where X is between 1 and 4

Nodes in p.gpu.ampere partition are in shared mode i.e. jobs allocate only requested resources. Default memory per job is 95000 MB. Use --mem parameter to set necessary amount of RAM for jobs.

GPU cards are in default compute mode.

To run code on nodes with different memory capacity (mem192G; mem384G) use --constraint option in a sbatch script: #SBATCH --constraint=mem192G or #SBATCH --constraint=mem384G

To check node features, general resources and scheduling weight of nodes use sinfo -O nodelist,features:25,gres,weight

Support:

For support please create a trouble ticket at the MPCDF helpdesk