Iron Research


Name of the cluster:

CMMC-CMFE-CMTI

Institution:

Max Planck Institute for Iron Research

Login nodes:

  • cmti001.bc.rzg.mpg.de

  • cmti002.bc.rzg.mpg.de

Hardware Configuration:

Login nodes

Compute nodes (14320 CPU cores)

Compute nodes (8560 CPU cores)

2 x cmti[001-002]

358 x cmti[003-360]

214 x cmfe[001-214]

CPU

Intel(R) Xeon(R) Gold 6230 CPU @ 2.10GHz

Intel(R) Xeon(R) CPU E5-2698 v4 @ 2.20GHz

CPU(s)

40

40

20

Thread(s) per core

1

1

1

Core(s) per socket

20

20

20

Socket(s)

2

2

2

RAM

192 GB

192 GB

128 GB

  • cmfe[001-214] nodes:  interconnect is based on Mellanox Technologies InfiniBand fabric (Speed: 56Gb)

  • cmti[001-360] nodes:  interconnect is based on Intel Omni-Path Fabric Technologies (Speed: 100Gb)

Filesystems:

GPFS-based with total size of 420 TB and independent inode space for the following filesets:

/u

shared home filesystem; GPFS-based; user quotas (currently default is 1TB) enforced; quota can be checked with ‘/usr/lpp/mmfs/bin/mmlsquota’.

/cmmc/ptmp

shared scratch filesystem ; GPFS-based; no quotas enforced; NO BACKUPS!

Compilers and Libraries:

The “module” subsystem is implemented on CMMC. Please use ‘module available’ to see all available modules.

  • Intel compilers (-> ‘module load intel’): icc, icpc, ifort

  • GNU compilers (-> ‘module load gcc’): gcc, g++, gfortran

  • Intel MKL (-> ‘module load mkl’): $MKL_HOME defined; libraries found in $MKL_HOME/lib/intel64

  • Intel MPI (-> ‘module load impi’): mpicc, mpigcc, mpiicc, mpiifort, mpiexec, … This module becomes visible and loadable only after a compiler module (Intel or GCC) has been loaded

  • Python (-> ‘module load anaconda’): python

Batch system based on Slurm:

The batch system on CMMC is the Slurm Workload Manager. A brief introduction into the basic commands (srun, sbatch, squeue, scancel, …) can be found on the Cobra home page For more detailed information, see the Slurm handbook. See also the sample batch scripts which must be modified for CMFE cluster.

Current Slurm configuration on CMFE:

  • default turnaround time: 24 hours

  • current max. turnaround time (wallclock): 96 hours

  • two partitions: p.cmfe and s.cmfe (default)

  • p.cmfe partition:  for parallel MPI or hybrid MPI/OpenMP jobs. Resources are exclusively allocated on nodes. Max. nodes per job is 44

  • s.cmfe partition: for serial or OpenMP jobs. Nodes are shared. Jobs are limited to use CPUs only on one node. Default RAM per CPU is 2 GB

  • cmfe nodes features: cmfe, broadwell, mem128G

  • cmti  nodes features: cmti, cascadelake, mem192G

Useful tips:

By default run time limit used for jobs that don’t specify a value is 24 hours. Use --time option for sbatch/srun to set a limit on the total run time of the job allocation but not longer than 96 hours

By default jobs use all memory on nodes in p.cmfe partition.

In s.cmfe partition default allocated memory per job is 2048 MB. To grant the job access to use more or less memory on each node use --mem or --mem-per-cpu options for sbatch/srun
Note that the maximum amount of memory available for slurm jobs is less than the hardware specification. Real memory is 125000 MB for cmfe, and 188000 MB for cmti. If you ask for more memory, your job cannot be scheduled. If you need to exceed the fair share of memory (3 GB per core on cmfe, 4700 MB per core on cmti), please leave min. 2GB per unused core, to ensure that other jobs can use them. Occasional exceptions to this are acceptable if a job requires a higher RAM/CPU core ratio due to limitations in (efficient) parallelization - notably if using more cores would slow down the calculation, or increase the total memory demand

The OpenMP codes require a variable OMP_NUM_THREADS to be set. This can be obtained from the Slurm environment variable $SLURM_CPUS_PER_TASK which is set when --cpus-per-task is specified in a sbatch script (an example is on help information page)

To run code on nodes with different memory capacity (128G; 192G) use --constraint=<list> option in a sbatch script: --constraint=mem128G or --constraint=mem192G
To run code on nodes with specific CPU architecture use --constraint=<list> option in a sbatch script: --constraint=broadwell or --constraint=cascadelake
To run code on CMFE part of cluster use cmfe feature in sbatch script: --constraint=cmfe
To run code on CMTI part of cluster use --constraint=cmti option in script
To check node features use sinfo -O nodelist,features:30

Support:

For support please create a trouble ticket at the MPCDF helpdesk.