Science of Light
- Name of the cluster:
ZEROPOINT
- Institution:
Max Planck Institute for the Science of Light
Login nodes:
|
|
|
|
Hardware-Configuration and Slurm partitions:
(phase 1):
partition |
HighMem |
HighFreq |
DGX |
||
---|---|---|---|---|---|
# nodes |
4 |
8 |
1 |
||
hostnames |
zp[01-04] |
zp[001-008] |
zpx |
||
Slurm partition |
highmem |
highfreq |
dgx |
||
CPU |
model |
Xeon Gold 6130 |
Xeon Gold 6144 |
Xeon E5-2698 v4 |
|
architecture |
x86_64 |
x86_64 |
x86_64 |
||
producer |
Intel |
Intel |
Intel |
||
microarchitecture |
Skylake-SP |
Skylake-SP |
Broadwell-EP |
||
CPUs per node |
2 |
2 |
2 |
||
cores per CPU |
16 |
8 |
20 |
||
threads per core |
1 |
1 |
1 |
||
clock rate (base/boost), GHz |
2.1 / 3.7 |
3.5 / 4.2 |
2.2 / 3.6 |
||
cache size (L3) |
22 MB |
24.75 MB |
50 MB |
||
SIMD instruction set |
AVX-512 |
AVX-512 |
AVX-2 |
||
RAM size |
zp[01-03] zp04 |
1 TiB 960 GiB |
96 GiB |
500 GiB |
|
GPU |
model |
– |
– |
Tesla V100 |
|
producer |
– |
– |
Nvida |
||
architecture |
– |
– |
Volta |
||
GPUs per node |
– |
– |
8 |
||
Node interconnect |
1 Gb/s ethernet |
(phase 2):
partition |
Standard (default) |
GPU |
|
---|---|---|---|
# nodes |
68 |
32 |
|
hostnames |
zp[101-168] |
zpg[001-032] |
|
Slurm partition |
standard |
gpu |
|
CPU |
model |
Xeon Gold 6130 |
Xeon Gold 6130 |
architecture |
x86_64 |
x86_64 |
|
producer |
Intel |
Intel |
|
microarchitecture |
Skylake-SP |
Skylake-SP |
|
CPUs per node |
2 |
2 |
|
cores per CPU |
16 |
16 |
|
threads per core |
1 |
1 |
|
clock rate (base/boost), GHz |
2.1 / 3.7 |
2.1 / 3.7 |
|
cache size (L3) |
22 MB |
22 MB |
|
SIMD instruction set |
AVX-512 |
AVX-512 |
|
RAM size |
187 GiB |
187 GiB |
|
GPU |
model |
– |
Quadro RTX 6000 |
producer |
– |
Nvidia |
|
architecture |
– |
Turing |
|
GPUs per node |
– |
2 |
|
Node interconnect |
1 Gb/s ethernet |
Filesystems:
GPFS-based with total size of 27 TB:
- /u
shared home filesystem with user home directory in
/u/<username>
; GPFS-based; user quotas (currently 600 GB, 1M files) enforced; quota can be checked with ‘/usr/lpp/mmfs/bin/mmlsquota’.- /ptmp
- shared scratch filesystem with user directory in
/ptmp/<username>
;GPFS-based; no quotas enforced.NO BACKUPS!
Compilers and Libraries:
The hierarchical “module” subsystem is implemented on ZEROPOINT. Please use ‘module available’ to see all available modules.
Batch system based on Slurm:
The batch system on ZEROPOINT is the Slurm Workload Manager.
Current Slurm configuration on ZeroPoint:
default turnaround time: 3 days
current max. turnaround time (wallclock): 7 days
default partition: standard
Useful tips:
To run GPU codes use gpu partition add option --gres=gpu:N, where N is number of GPUs (max is 2) into your batch scripts: #SBATCH -p gpu --gres=gpu:1
To run GPU codes on zpx add option --gres=gpu:N, where N is number of GPUs (max is 8): srun -p dgx --gres=gpu:1 --pty bash -l
How to use parallel COMSOL runs on cluster please look at sample batch scripts
Here 10 subkernels are launched at a time, but this value needs to be adapted depending on the networks performance and time out value.
Support:
For support please create a trouble ticket at the MPCDF helpdesk.