Radioastronomy
- Name of the linux cluster:
HERCULES
- Institution:
Max Planck Institute for Radio Astronomy
Login nodes:
hercules11.bc.rzg.mpg.de |
hercules12.bc.rzg.mpg.de |
Hardware Configuration:
- 2 login nodes hercules[11-12]
- 2 x Intel(R) Xeon(R) Silver 4214R CPU @ 2.40GHz24 cores per nodehyper-threading disabled - 1 threads per core188 GB RAM;
- 32 execution nodes hc[201-232] for parallel computing
- total amount of 1536 CPU cores2 x Intel(R) Xeon(R) Platinum 8268 CPU @ 2.90GHz48 cores per nodehyper-threading disabled - 1 threads per core377 GB RAM
- 54 execution nodes hcg[001-054] for parallel GPU computing
- total amount of 2592 CPU cores2 x Intel(R) Xeon(R) Platinum 8268 CPU @ 2.90GHz48 cores per nodehyper-threading disabled - 1 threads per core377 GB RAM3 x Quadro RTX 6000 GPUs per node
- node interconnect
based on 25 Gb/s ethernet
Filesystems:
- /u
shared home filesystem; quoted to 1TB of data and 600K files; quota can be checked with ‘/usr/lpp/mmfs/bin/mmlsquota’
- /mkfs
dedicated project area for selected users - NO BACKUPS!
- /hercules
dedicated project area - NO BACKUPS!
- /scratch
dedicated scratch area for all users - NO BACKUPS!
- /mandap
incoming data from Bonn (only available on login nodes) - NO BACKUPS!
Software Configuration:
The “module” subsystem is implemented on HERCULES cluster. Please use ‘module available’ to see all available modules.
Intel compilers (-> ‘module load intel/19.1.3’): icc, icpc, ifort
GNU compilers (-> ‘module load gcc’): gcc, g++, gfortran
Intel MKL (‘module load mkl’): $MKL_HOME defined; libraries found in $MKL_HOME/lib/intel64
Intel MPI 2019.9 (‘module load impi/2019.9’): mpicc, mpigcc, mpiicc, mpiifort, mpiexec, …´
Batch system based on Slurm:
sbatch, srun, squeue, sinfo, scancel, scontrol, s*
five partitions:
short.q (default), long.q, gpu.q for serial jobs on shared nodes
parallel.q for multi-nodes prallel hybrind MPI/OpenMP jobs, nodes are allocated exclusively
gpu42cores.q for serial cpu jobs only with 42 cores and 50% of RAM per node
gpu6cores.q for serial gpu jobs only with 6 cores and 50% of RAM per node
interactive.q to debug serial/parallel jobs. Currently disabled.
sample batch scripts can be found on Cobra home page (must be modified for HERCULES)
Slurm partition |
short.q (default) |
long.q |
gpu.q |
parallel.q |
gpu42cores.q |
gpu6cores.q |
interactive.q |
---|---|---|---|---|---|---|---|
number of nodes |
86 |
32 |
54 |
32 |
54 |
54 |
4 |
hostnames |
hc[201-232] hcg[001-054] |
hc[201-232] |
hcg[001-054] |
hc[201-232] |
hcg[001-054] |
hcg[001-054] |
hc*,hcg* |
default run time limit |
4 hours |
24 hours |
24 hours |
48 hours |
24 hours |
24 hours |
2 hours |
maximum run time limit |
4 hours |
240 hours |
240 hours |
240 hours |
240 hours |
240 hours |
12 hours |
default memory per node |
8000 MB |
8000 MB |
120000 MB |
370000 MB |
4000 MB |
60000 MB |
8000 MB |
maximum memory per node |
370000 MB |
370000 MB |
370000 MB |
370000 MB |
185000 MB |
185000 MB |
370000 MB |
maximum nodes per job |
1 |
1 |
1 |
32 |
1 |
1 |
2 |
maximum cpus per node |
48 |
48 |
48 |
48 |
42 |
6 |
48 |
execute more than 1 job at a time on each node |
Yes |
Yes |
Yes, max. 3 jobs |
No |
Yes |
Yes, max. 3 jobs |
Yes |
gpus per node |
-- |
-- |
3 |
-- |
-- |
3 |
0(hc*),3(hcg*) |
Useful tips:
By default run time limit used for jobs that don’t specify a value and partition is 4 hours. Use --time option for sbatch/srun to set a limit on the total run time of the job allocation but not longer than 10 days on long.q, gpu.q and parallel.q partitions
Default memory per job in serial partitions is 8000M. To grant the job access to all of the memory on each node use –mem=0 option for sbatch/srun
The OpenMP codes require a variable OMP_NUM_THREADS to be set. This can be obtained from the Slurm environment variable $SLURM_CPUS_PER_TASK which is set when --cpus-per-task is specified in a sbatch script (an example is on help information page). Exporting of OMP_PLACES=cores also can be useful.
Support:
For support please create a trouble ticket at the MPCDF helpdesk