Psychiatry PIROL
- Name of the cluster:
PIROL
- Institution:
Max Planck Institute of Psychiatry
Login nodes:
pirol01.hpccloud.mpcdf.mpg.de
Hardware-Configuration:
Login node pirol01:
CPU Model: Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz
1 socket
18 cores per socket
no hyper-threading (1 thread per core)
120 GB RAM
6 cpu execution nodes pirolc[001-006] :
CPU Model: Intel(R) Xeon(R) Platinum 8360Y CPU @ 2.40GHz
1 socket
12 cores per socket
no hyper-threading (1 thread per core)
80 GB RAM
6 gpu execution nodes pirolg[001-006] :
CPU Model: Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz
1 socket
10 cores per socket
400 GB RAM
1 x Nvidia A40
node interconnect is based on 10 Gb/s ethernet
Filesystems:
- /u
shared home filesystem with user home directory in
/u/<username>
; user quotas (currently 200 GB, 250k files) enforced. User quotas can be checked using thequota
command (e.g.quota --show-mntpoint --hide-device -f /pirol/u
).
- /nexus/posix0/MPI-psych
shared scratch filesystem with user directory in
/nexus/posix0/MPI-psych/<username>
Compilers and Libraries:
Hierarchical environment modules are used at MPCDF to provide software packages and enable switching between different software versions. There are no modules preloaded on PIROL. User have to specify the needed modules with explicit versions at login and during the startup of a batch job. Not all software modules are displayed immediately by the module avail command, for some user first needs to load a compiler and/or MPI module. You can search the full hierarchy of the installed software modules with the find-module command.
Batch system based on Slurm:
a brief introduction into the basic commands (srun, sbatch, squeue, scancel, sinfo, s*…) can be found on the Raven home page or on the Slurm handbook
Current Slurm configuration on PSYCL:
two partitions: c.pirol (default), g.pirol (for gpu jobs and high memory cpu jobs)
current max. run time (wallclock): (11 days, default runtime is 24 hours)
default memory per node for jobs: c.pirol (10000 MB), g.pirol (39600 MB)
c.pirol, g.pirol: resources on the nodes may be shared between jobs
g.pirol partition: to access GPU resources --gres parameter must be explicitly set for jobs
sample batch scripts can be found on Raven home page (must be modified for PIROL)
Useful tips:
By default run time limit used for jobs that don’t specify a value is 24 hours. Use --time option for sbatch/srun to set a limit on the total run time of the job allocation but not longer than 11 days
Default memory per node is 10G & 38G. To grant the job access to all of the memory on each node use –mem=0 option for sbatch/srun
The OpenMP codes require a variable OMP_NUM_THREADS to be set. This can be obtained from the Slurm environment variable $SLURM_CPUS_PER_TASK which is set when --cpus-per-task is specified in a sbatch script
To run code with different memory limits than the defaults, choose appropriate partition and set the required memory(c.pirol with max 72000M per node and g.pirol with max 396000M per node) by using --partition option in a sbatch script: #SBATCH --partition=c.pirol or #SBATCH --partition=g.pirol
GPU cards are in default compute mode.
Support:
For support please create a trouble ticket at the MPCDF helpdesk