Biochemistry HPCL67


Name of the cluster:

HPCL67

Institution:

Max Planck Institute of Biochemistry

Login nodes:

  • hpcl[7001-7008].bc.rzg.mpg.de

Hardware Configuration:

Login nodes hpcl[7001-7008] :

  • CPUs Model: Intel(R) Xeon(R) CPU E5-2698 v4 @ 2.20GHz

  • 2 sockets

  • 20 cores per socket

  • no hyper-threading (1 threads per core)

  • 510 GB RAM

  • 2 x Tesla P100-PCIE-16GB GPUs ( hpcl[7001,7005-7008] only )

91 execution nodes hpcl[6001-6067], hpcl[7009-7032] :

  • CPUs Model: Intel(R) Xeon(R) CPU E5-2698 v4 @ 2.20GHz

  • 2 sockets

  • 20 cores per socket

  • no hyper-threading (1 threads per core)

  • 510 GB RAM

  • Node interconnect is based on 10 Gb/s ethernet

Compilers and Libraries:

The “module” subsystem is implemented on HPCL67. Please use ‘module available’ to see all available modules.

  • Intel compilers (-> ‘module load intel’): icc, icpc, ifort

  • GNU compilers (-> ‘module load gcc’): gcc, g++, gfortran

  • Intel MKL (-> ‘module load mkl’): $MKL_HOME defined; libraries found in $MKL_HOME/lib/intel64

  • Intel MPI (-> ‘module load impi’): mpicc, mpigcc, mpiicc, mpiifort, mpiexec, …´

  • OpenMPI (-> ‘module load openmpi’): mpicc, mpicxx, mpif77, mpif90, mpirun, mpiexec

  • Python (-> ‘module load anaconda’): python

Batch system based on Slurm:

The batch system on HPCL67 is the Slurm Workload Manager. A brief introduction into the basic commands (srun, sbatch, squeue, scancel, …) can be found on the Cobra home page. For more detailed information, see the Slurm handbook. See also the sample batch scripts which must be modified for HPCL67 cluster (partition must be changed).

Current Slurm configuration on HPCL67:

  • default run time: 24 hours

  • current max. run time (wallclock): 21 days

  • only one and default partition: p.hpcl67

  • nodes are exclusively allocated to users. Multiple jobs may be run for the same user only

  • default memory size per job on node: 510000 MB

  • max submitted jobs per user: 2000

  • max running jobs per user at one time: 200

Useful tips:

By default run time limit used for jobs that don’t specify a value is 24 hours. Use --time option for sbatch/srun to set a limit on the total run time of the job allocation but not longer than 504 hours

Memory is consumable resource. To run several jobs on one node use --mem=<size[units]> or --mem-per-cpu=<size[units]> options for sbatch/srun, where size should be less than default per node (510000 MB)

The OpenMP codes require a variable OMP_NUM_THREADS to be set. This can be obtained from the Slurm environment variable $SLURM_CPUS_PER_TASK which is set when --cpus-per-task is specified in a sbatch script (an example is on help information page)

Support:

For support please create a trouble ticket at the MPCDF helpdesk.