Biochemistry HPCL8
- Name of the cluster:
HPCL8
- Institution:
Max Planck Institute of Biochemistry
Login nodes:
|
|
|
Hardware Configuration:
- 7 login nodes hpcl[8001-8004] & hpcl[8061-8063]
- 2 x Intel(R) Xeon(R) Silver 4116 CPU @ 2.10GHz24 cores per nodehyper-threading disabled - 1 threads per core377 GB RAM;2 x RTX 5000 GPUsnode interconnect: based on 25 Gb/s ethernet
- 56 execution nodes hpcl[8005-8060] for parallel CPU/GPU computing
- total amount of 1344 CPU cores2 x Intel(R) Xeon(R) Gold 6138 CPU @ 2.00GHz24 cores per nodehyper-threading disabled - 1 threads per core377 GB RAM2 x RTX 5000 GPUsnode interconnect: based on 25 Gb/s ethernet
- 2 login nodes hpcl[9001-9002]
- 2 x Intel(R) Xeon(R) Platinum 8360Y CPU @ 2.40GHz72 cores per nodehyper-threading disabled - 1 threads per core1 TB RAM4 x NVIDIA A40 GPUsnode interconnect: based on 50 Gb/s ethernet
- 9 execution nodes hpcl[9003-9011] for parallel CPU/GPU computing
- total amount of 648 CPU cores2 x Intel(R) Xeon(R) Platinum 8360Y CPU @ 2.40GHz72 cores per nodehyper-threading disabled - 1 threads per core1 TB RAM4 x NVIDIA A40 GPUsnode interconnect: based on 50 Gb/s ethernet
- 4 execution nodes hpcl[9101-9104] for parallel CPU/GPU computing
- total amount of 304 CPU coresIntel(R) Xeon(R) Platinum 8368 CPU @ 2.40GHz76 cores per nodehyper-threading enabled - 2 threads per core1 TB RAM4 x NVIDIA H100 GPUsnode interconnect: based on 50 Gb/s ethernet
- 3 execution nodes hpcl[9201-9103] for parallel CPU
- total amount of 192 CPU coresAMD EPYC 9374F 32-Core CPU @ 3.80GHz64 cores per nodehyper-threading enabled - 2 threads per core512 GB RAMnode interconnect: based on 50 Gb/s ethernet
- 1 login nodes hpcl9301
- 2 x AMD EPYC 9534 64-Core CPU @ 3.7GHz128 cores per nodehyper-threading enabled - 2 threads per core755 GB RAM4 x NVIDIA L40s GPUsnode interconnect: based on 50 Gb/s ethernet
- 19 execution nodes hpcl[9302-9320] for parallel CPU/GPU computing
- total amount of 2432 CPU cores2 x AMD EPYC 9534 64-Core CPU @ 3.7GHz128 cores per nodehyper-threading enabled - 2 threads per core755 GB RAM4 x NVIDIA L40s GPUsnode interconnect: based on 50 Gb/s ethernet
Compilers and Libraries:
The “module” subsystem is implemented on HPCL8. Please use ‘module available’ to see all available modules.
Intel compilers (-> ‘module load intel’): icc, icpc, ifort
GNU compilers (-> ‘module load gcc’): gcc, g++, gfortran
Intel MKL (-> ‘module load mkl’): $MKL_HOME defined; libraries found in $MKL_HOME/lib/intel64
Intel MPI (-> ‘module load impi’): mpicc, mpigcc, mpiicc, mpiifort, mpiexec, …´
OpenMPI (-> ‘module load openmpi’): mpicc, mpicxx, mpif77, mpif90, mpirun, mpiexec
Python (-> ‘module load anaconda’): python
Batch system based on Slurm:
The batch system on HPCL8 is the Slurm Workload Manager. A brief introduction into the basic commands (srun, sbatch, squeue, scancel, …) can be found on the Raven home page. For more detailed information, see the Slurm handbook. See also the sample batch scripts which must be modified for HPCL8 cluster (partition must be changed).
Current Slurm configuration on HPCL8:
default run time: 24 hours
current max. run time (wallclock): 21 days
four partitions: p.hpcl8 (default), p.hpcl9 (Dept. Briggs only) p.hpcl91 (b_borgwardt group only), p.hpcl92 (b_mann & g_rz groups only) & p.hpcl93
nodes in p.hpcl8 & p.hpcl9 partitions are exclusively allocated to users. Multiple jobs may be run for the same user only
nodes in p.hpcl91, p.hpcl92 & p.hpcl93 can be shared by jobs
default memory size per job on node: 380000 MB (p.hpcl8 partition), 1000000 MB (p.hpcl9 partition), 40000 MB (p.hpcl91 partition), 32000 MB (p.hpcl92 partition) & 38000 MB (p.hpcl93 partition)
max submitted jobs per user: 2000
max running jobs per user at one time: 200
Useful tips:
By default run time limit used for jobs that don’t specify a value is 24 hours. Use --time option for sbatch/srun to set a limit on the total run time of the job allocation but not longer than 504 hours
Memory is consumable resource. To run several jobs on one node use --mem=<size[units]> or --mem-per-cpu=<size[units]> options for sbatch/srun, where size should be less than default per node (380000 MB)
The OpenMP codes require a variable OMP_NUM_THREADS to be set. This can be obtained from the Slurm environment variable $SLURM_CPUS_PER_TASK which is set when --cpus-per-task is specified in a sbatch script (an example is on help information page)
To use GPUs add in your slurm scripts --gres option and choose how many GPUs to allocate: #SBATCH --gres=gpu:1 or #SBATCH --gres=gpu:2
Support:
For support please create a trouble ticket at the MPCDF helpdesk.