Biophysics BIO


Name of the cluster:

BIO

Institution:

Max Planck Institute of Biophysics

Login nodes:

  • bio01.tbc.biophys.mpg.de

  • bio02.tbc.biophys.mpg.de

Hardware Configuration:

Login nodes

Compute nodes ( 2760 CPU cores)

2 login nodes bio[01-02]

69 compute nodes bio[001-069]

CPU

Intel(R) Xeon(R) Gold 6248 CPU @ 2.50GHz

Intel(R) Xeon(R) Gold 6248 CPU @ 2.50GHz

CPU(s)

40

40

Thread(s) per core

1

1

Core(s) per socket

20

20

Socket(s)

2

2

RAM

772 GB

192 GB

GPU(s)

2 x Quadro RTX 5000

2 x Quadro RTX 5000

  • node interconnect is based on Mellanox Technologies InfiniBand fabric (Speed: 56Gb)

Filesystems:

GPFS-based with total size of 641 TB:

/u

shared home filesystem (641 TB) with user home directory in /u/<username>; GPFS-based; no quotas enforced. NO BACKUPS!

Compilers and Libraries:

The “module” subsystem is implemented on BIO. Please use ‘module available’ to see all available modules.

  • Intel compilers (-> ‘module load intel’): icc, icpc, ifort

  • GNU compilers (-> ‘module load gcc’): gcc, g++, gfortran

  • Intel MKL (- > ‘module load mkl’): $MKL_HOME defined; libraries found in $MKL_HOME/lib/intel64

  • Intel MPI (-> ‘module load impi’): mpicc, mpigcc, mpiicc, mpiifort, mpiexec, … This module becomes visible and loadable only after a compiler module (Intel or GCC) has been loaded

  • CUDA (-> ‘module load cuda’)

  • Python (-> ‘module load anaconda’): python

Batch system based on Slurm:

The batch system on BIO is the Slurm Workload Manager. A brief introduction into the basic commands (srun, sbatch, squeue, scancel,…) can be found on the Cobra home page. For more detailed information, see the Slurm handbook. See also the sample batch scripts which must be modified for BIO cluster.

Current Slurm configuration on BIO:

  • default turnaround time: 2 hours

  • current max. turnaround time (wallclock): 24 hours

  • p.bio partition include all batch nodes in exclusive usage and is default

  • s.bio partition can be used for serial jobs and can be shared

  • l.bio shared partition for long running (up to 5 days) serial jobs (<=5 cores per job; <=160 cores in total)

Useful tips:

By default run time limit used for jobs that don’t specify a value is 2 hours. Use --time option for sbatch/srun to set a limit on the total run time of the job allocation but not longer than 24 hours

Default memory per node is 9600 MB. To grant the job access to all of the memory on each node use --mem=0 option for sbatch/srun

The OpenMP codes require a variable OMP_NUM_THREADS to be set. This can be obtained from the Slurm environment variable $SLURM_CPUS_PER_TASK which is set when --cpus-per-task** is specified in a sbatch script (an example is on help information page). Exporting of OMP_PLACES=cores also can be useful

To use GPUs add in your slurm scripts --gres options and choose how many GPUs and/or which model of them to have: #SBATCH --gres=gpu:rtx5000:1 or #SBATCH --gres=gpu:rtx5000:2

Valid gres options are: gpu[[:type]:count]
where
type is a type of gpu (rtx5000)
count is a number of resources (1 or 2)

GPU cards are in default compute mode.