Biochemistry HPCL67

Name of the cluster:: HPCL67
Institution:: Max Planck Institute of Biochemistry

Hardware Configuration:

Login nodes hpcl[7001-7008] :

CPUs Model: Intel(R) Xeon(R) CPU E5-2698 v4 @ 2.20GHz
2 sockets
20 cores per socket
no hyper-threading (1 threads per core)
510 GB RAM
2 x Tesla P100-PCIE-16GB GPUs ( hpcl[7001,7005-7008] only )

91 execution nodes hpcl[6001-6067], hpcl[7009-7032] :

CPUs Model: Intel(R) Xeon(R) CPU E5-2698 v4 @ 2.20GHz
2 sockets
20 cores per socket
no hyper-threading (1 threads per core)
510 GB RAM

Node interconnect is based on 10 Gb/s ethernet

Compilers and Libraries:

The “module” subsystem is implemented on HPCL67. Please use ‘module available’ to see all available modules.

Intel compilers (-> ‘module load intel’): icc, icpc, ifort
GNU compilers (-> ‘module load gcc’): gcc, g++, gfortran
Intel MKL (-> ‘module load mkl’): $MKL_HOME defined; libraries found in $MKL_HOME/lib/intel64
Intel MPI (-> ‘module load impi’): mpicc, mpigcc, mpiicc, mpiifort, mpiexec, …´
OpenMPI (-> ‘module load openmpi’): mpicc, mpicxx, mpif77, mpif90, mpirun, mpiexec
Python (-> ‘module load anaconda’): python

Batch system based on Slurm:

The batch system on HPCL67 is the Slurm Workload Manager. A brief introduction into the basic commands (srun, sbatch, squeue, scancel, …) can be found on the Cobra home page. For more detailed information, see the Slurm handbook. See also the sample batch scripts which must be modified for HPCL67 cluster (partition must be changed).

Current Slurm configuration on HPCL67:

default run time: 24 hours
current max. run time (wallclock): 21 days
only one and default partition: p.hpcl67
nodes are exclusively allocated to users. Multiple jobs may be run for the same user only
default memory size per job on node: 510000 MB
max submitted jobs per user: 2000
max running jobs per user at one time: 200

Useful tips:

By default run time limit used for jobs that don’t specify a value is 24 hours. Use --time option for sbatch/srun to set a limit on the total run time of the job allocation but not longer than 504 hours

Memory is consumable resource. To run several jobs on one node use --mem=<size[units]> or --mem-per-cpu=<size[units]> options for sbatch/srun, where size should be less than default per node (510000 MB)

The OpenMP codes require a variable OMP_NUM_THREADS to be set. This can be obtained from the Slurm environment variable $SLURM_CPUS_PER_TASK which is set when --cpus-per-task is specified in a sbatch script (an example is on help information page)

Support:

For support please create a trouble ticket at the MPCDF helpdesk.