Neurobiology of Behavior – caesar


Name of the cluster:

AXON

Institution:

Neurobiology of Behavior – caesar

Access:

  • axon01.bc.rzg.mpg.de

  • axon02.bc.rzg.mpg.de

Hardware Configuration:

Login nodes

Compute nodes

2 x axon[01-02]

26 x axon[001-026]

28 x axong[001-028]

CPU

Intel(R) Xeon(R) Gold 6138 CPU @ 2.00GHz

Intel(R) Xeon(R) Silver 4114 CPU @ 2.20GHz

CPU(s)

40

40

20

Thread(s) per core

1

1

1

Core(s) per socket

20

20

10

Socket(s)

2

2

2

RAM

377 GB

377 GB

188 GB

GPU

1 x GeForce RTX 2080 Ti

Node interconnect is based on Ethernet (Speed: 25 Gb)

Filesystems:

/u

shared home filesystem; GPFS-based; user quotas (currently 600GB, 1M files) enforced quota can be checked with ‘/usr/lpp/mmfs/bin/mmlsquota’. NO BACKUPS!

/axon/scratch

shared scratch filesystem (1.1 PB); GPFS-based; no quotas enforced NO BACKUPS!

Compilers and Libraries:

The “module” subsystem is implemented on AXON. Please use ‘module available’ to see all available modules.

  • Intel compilers (-> ‘module load intel’): icc, icpc, ifort

  • GNU compilers (-> ‘module load gcc’): gcc, g++, gfortran

  • Intel MKL (-> ‘module load mkl’): $MKL_HOME defined; libraries found in $MKL_HOME/lib/intel64

  • Intel MPI (-> ‘module load impi’): mpicc, mpigcc, mpiicc, mpiifort, mpiexec, …´

  • Python (-> ‘module load anaconda’): python

  • CUDA (-> ‘module load cuda’)

Batch system based on Slurm:

The batch system on AXON is the Slurm Workload Manager. A brief introduction into the basic commands (srun, sbatch, squeue, scancel, …) can be found on the Cobra home page. For more detailed information, see the Slurm handbook. See also the sample batch scripts which must be modified for AXON cluster (partition must be changed).

Current Slurm configuration on AXON:

  • default run time: 2 hours

  • current max. run time (wallclock): 1 days

  • two partitions: p.axon (default) and p.gpu

  • default memory per node for jobs: p.axon ( 9600 MB ) and p.gpu ( 190000 MB )

  • nodes are exclusively allocated to users. Multiple jobs may be run for the same user only

Useful tips:

By default run time limit used for jobs that don’t specify a value is 2 hours. Use --time option for sbatch/srun to set a limit on the total run time of the job allocation but not longer than 24 hours

By default jobs use all memory on nodes in p.gpu partition. To grant the job access to use less memory on each node use --mem option for sbatch/srun

To use all resources on node add --exclusive option for sbatch/srun

The OpenMP codes require a variable OMP_NUM_THREADS to be set. This can be obtained from the Slurm environment variable $SLURM_CPUS_PER_TASK which is set when --cpus-per-task is specified in a sbatch script (an example is on help information page)

To use GPUs add in your slurm scripts --partition=p.gpu and --gres option and choose how many GPUs to allocate: #SBATCH --gres=gpu:1
Valid gres options are: gpu[[:type]:count]
where
type is a type of gpu (rtx2080ti)
count is a number of resources (1)

GPU cards are in default compute mode.

Support:

For support please create a trouble ticket at the MPCDF helpdesk.