Gravitational Physics - ACR
- Name of the cluster:
URANIA
- Institution:
- Max Planck Institute for Gravitational Physics (Albert Einstein Institute):
ACR department
Access:
|
|
Configuration:
Login nodes urania[01-02] :
CPU Model: Intel(R) Xeon(R) Platinum 8360Y CPU @ 2.40GHz
2 sockets
36 cores per socket
hyper-threading on (2 threads per core)
512 GB RAM
84 execution nodes urania[001-084] :
CPU Model: Intel(R) Xeon(R) Platinum 8360Y CPU @ 2.40GHz
2 sockets
36 cores per socket
hyper-threading on (2 threads per core)
256 GB RAM
Node interconnect is based on Melanox/Nvidia Infiniband HDR-100 technology (Speed: 100Gb/s)
Filesystems:
- /u
shared home filesystem; GPFS-based; user quotas (currently 100GB, 1M files) enforced quota can be checked with ‘/usr/lpp/mmfs/bin/mmlsquota’
- /urania/ptmp
shared scratch filesystem (1.1 PB); GPFS-based; no quotas enforced NO BACKUPS!
Compilers and Libraries:
The “module” subsystem is implemented on URANIA. Please use ‘module available’ to see all available modules.
Intel compilers (-> ‘module load intel’): icc, icpc, ifort
GNU compilers (-> ‘module load gcc’): gcc, g++, gfortran
Intel MKL (-> ‘module load mkl’): $MKL_HOME defined; libraries found in $MKL_HOME/lib/intel64
Intel MPI (-> ‘module load impi’): mpicc, mpigcc, mpiicc, mpiifort, mpiexec, …´
Python (-> ‘module load anaconda’): python
Batch system based on Slurm:
The batch system on URANIA is the Slurm Workload Manager. A brief introduction into the basic commands (srun, sbatch, squeue, scancel, …) can be found on the Raven home page. For more detailed information, see the Slurm handbook. See also the sample batch scripts which must be modified for URANIA cluster (partition must be changed).
Current Slurm configuration on URANIA:
two partitions: p.urania (default), p.debug (2 nodes)
default run time: 24 hours (p.urania), 12 hours (p.debug)
current max. run time (wallclock): 1 days
default memory per node for jobs: p.urania ( 240000 MB )
nodes are exclusively allocated to jobs in p.urania
Useful tips:
By default run time limit used for jobs that don’t specify a value is default time limit per partition. Use --time option for sbatch/srun to set a limit on the total run time of the job allocation but not longer than 24 hours
The OpenMP codes require a variable OMP_NUM_THREADS to be set. This can be obtained from the Slurm environment variable $SLURM_CPUS_PER_TASK which is set when --cpus-per-task is specified in a sbatch script (check examples on sample batch scripts page)
On login nodes to debug codes interactively with the native Intel MPI process managers (mpiexec/mpirun) use ‘impi-interactive’ which needs to be loaded after another ‘impi’ module has been loaded
Support:
For support please create a trouble ticket at the MPCDF helpdesk.