Cobra User Guide
Warning
Cobra Batch job processing will end on July 1st, 2024
Cobra Login nodes will be decommissioned on July 19th, 2024
Contents
System Overview
The Supercomputer Cobra was installed in spring 2018, got expanded with NVIDIA Tesla V100 GPUs in Dec 2018 and with NVIDIA Quadro RTX 5000 GPUs in July 2019.
All compute nodes contain two Intel Xeon Gold 6148 processors (Skylake (SKL), 20 cores @ 2.4 GHz) and are connected through a 100 Gb/s OmniPath interconnect. Each island (~ 636 nodes) has a non-blocking, full fat tree network topology, while among islands a blocking factor of 1:8 applies. Therefore batch jobs are restricted to a single island. In addition there are 6 login nodes and an I/O subsystem that serves 5 PetaByte of disk storage with direct HSM access (via GHI).
Overall configuration
1284 compute nodes (2 × SKL), 96 GB RAM DDR4 each
1908 compute nodes (2 × SKL), 192 GB RAM DDR4 each
16 compute nodes (2 × SKL), 384 GB RAM DDR4 each
8 compute nodes (2 × SKL), 768 GB RAM DDR4 each
64 compute nodes (2 × SKL + 2 × NVIDIA Tesla V100-32)
120 compute nodes (2 × SKL + 2 × NVIDIA Quadro RTX 5000)
24 compute nodes (2 × SKL), 192 GB RAM DDR4 each (dedicated to MPSD)
Access
Login
For security reasons, direct login to the HPC cluster Cobra is allowed only from within the MPG networks. Users from other locations have to login to one of our gateway systems first. Use ssh to connect to Cobra:
ssh cobra.mpcdf.mpg.de
You will be directed to one of the Cobra login nodes (cobra01i, cobra02i). You have to provide your (Kerberos) password and an OTP on the Cobra login nodes. SSH keys are not allowed.
Secure copy (scp) can be used to transfer data to or from cobra.mpcdf.mpg.de
Cobra’s (all login/interactive nodes) ssh key fingerprints (SHA256) are:
G45rl+n9MWi/TWQA3bYXoVxBI/wiOviJXe99H4SacWU (RSA)
KcGJxKBfrsVyexByJFgbuFDigfvGfrgZ5Urvmh/ZJLI (ED25519)
Using compute resources
The pool of login nodes cobra.mpcdf.mpg.de is mainly intended for editing, compiling and submitting your parallel programs. Running parallel programs interactively in production mode on the login nodes is not allowed. Jobs have to be submitted to the Slurm batch system which reserves and allocates the resources (e.g. compute nodes) required for your job. Further information on the batch system is provided below.
Interactive (debug) runs
If you need to test or debug your code, you may login to ‘cobra-i.mpcdf.mpg.de’ (cobra03i-cobra06i) and run your code interactively (2 hours at most) with the command:
srun -n NUMBER_OF_CORES -p interactive --time=TIME_LESS_THAN_2HOURS --mem=MEMORY_LESS_THAN_32G ./EXECUTABLE
But please, take care that the machine does not become overloaded. Don’t use more than 8 cores in total and do not request more than 32 GB of main memory. Neglecting these recommendations may cause a system crash or hangup!
Internet access
Connections to the Internet are only permitted from the login nodes in
outgoing direction; Internet access from within batch jobs is not possible.
To download source code or other data, command line tools such as
wget
, curl
, rsync
, scp
, pip
, git
, or similar
may be used interactively on the login nodes. In case the transfer is expected to
take a long time it is useful to run it inside a screen
or tmux
session.
Hardware configuration
Compute nodes
3240 compute nodes
Processor type: Intel Skylake 6148
Processor clock: 2.4 GHz
Theoretical peak performance per node: 2.4 GHz * 32 DP Flops/cycle * 40 = 3072 DP GFlop/s
Cores per node: 40 (each with 2 hyperthreads, thus 80 logical CPUs per node)
Node topology: 2 NUMA domains with 20 physical cores each
Main memory
standard nodes: 1284 × 96 GB
large memory nodes: 1932 × 192 GB
very large memory nodes: 16 × 384 GB, 8 × 768 GB
Accelerator part of Cobra:
64 nodes, each hosting 2 V100 GPUs (Tesla V100-PCIE-32GB: 32 GB HBM2, 5120 CUDA cores + 640 Tensor cores @ 1380 MHz, compute capability 7.0 / “Volta”)
120 nodes, each hosting 2 RTX5000 GPUs (Quadro RTX 5000: 16 GB GDDR6, 3072 CUDA cores + 384 Tensor cores + 48 RT units @ 1935 MHz, compute capability 7.5 / “Turing”)
Login and interactive nodes
2 nodes for login (Hostname
cobra.mpcdf.mpg.de
)4 nodes for interactive program development and testing (Hostname cobra-i.mpcdf.mpg.de)
Main memory: 4 × 192 GB
Batch access is possible via the Slurm batch system from the login nodes
cobra.mpcdf.mpg.de
and cobra-i.mpcdf.mpg.de
.
Interconnect
fast OmniPath (100 Gb/s) network connecting all the nodes
The compute nodes and GPU nodes are bundled into 6 domains (islands) with 636 nodes (or 64 nodes in case of GPU island), each. Within one domain, the OmniPath network topology is a ‘fat tree’ topology for highly efficient communication. The OmniPath connection between the islands is much weaker, so batch jobs are restricted to a single island, that is 636 nodes.
I/O subsystem
8 I/O nodes
5 PB of online disk space
File systems
AFS
AFS is only available on the login nodes cobra.mpcdf.mpg.de
and on the
interactive nodes cobra-i.mpcdf.mpg.de
in order to access software that
is distributed by AFS. If you don’t get automatically an AFS token
during login, you can get an AFS token with the command
/usr/bin/klog.krb5
.
Note that there is no AFS on the compute nodes, so you have to avoid any
dependencies on AFS in your job.
GPFS
There are two global, parallel file systems of type
GPFS
(/u
and /ptmp
), symmetrically accessible from all Cobra cluster nodes,
plus the migrating file system /r
interfacing to the HPSS archive system.
File system /u
The file system /u
(a symbolic link to /cobra/u
) is designed for permanent
user data such as source files, config files, etc. The size of /u
is 0.6 PB
mirrored (RAID 6). Note that no system backups are performed. Your home
directory is in /u
. The default disk quota in /u
is 2.5 TB, the file
quota is 2 mio files. You can check your disk quota in /u
with the command:
/usr/lpp/mmfs/bin/mmlsquota cobra_u
File system /ptmp
The file system /ptmp
(a symbolic link to /cobra/ptmp) is designed for
batch job I/O (4.5 PB mirrored, RAID 6, no system backups). Files in
/ptmp
that have not been accessed for more than 12 weeks will be removed
automatically. The period of 12 weeks may be reduced if necessary (with prior
notification).
As a current policy, no quotas are applied on /ptmp
. This gives users the
freedom to manage their data according to their actual needs without
administrative overhead. This liberal policy presumes a fair usage of the
common file space. So, please do a regular housekeeping of your data and
archive/remove files that are not used actually.
Archiving data from the GPFS file systems to tape can be done using the
migrating file system /r
(see below).
File system /r
The /r
file system (a symbolic link to /ghi/r
) stages archive data. It
is available only on the login nodes cobra.mpcdf.mpg.de
and on the
interactive nodes cobra-i.mpcdf.mpg.de
.
Each user has a subdirectory /r/*initial*/*userid*
to store data. For
efficiency, files should be packed to tar files (with a size of about 1 GB to
1 TB) before archiving them in /r
, i.e., please avoid archiving small
files. When the file system /r
gets filled above a certain value, files
will be transferred from disk to tape, beginning with the largest files which
have not been used for the longest time.
For documentation on how to use the MPCDF archive system, please see the backup and archive section.
/tmp
Please, don’t use the file system /tmp
for scratch data. Instead, use
/ptmp
which is accessible from all Cobra cluster nodes. In cases where an application really depends on node-local storage, you can use the variables JOB_TMPDIR
and JOB_SHMTMPDIR
, which are set individually for each job.
Software
Access to software via environment modules
Environment modules are used at MPCDF to provide software packages and enable switching between different software versions.
Use the command
module avail
to list the available software packages on the HPC system. Note that you
can search for a certain module by using the find-module
tool (see
below).
Use the command
module load package_name/version
to actually load a software package at a specific version.
Further information on the environment modules on Cobra and their hierarchical organization is given below.
Information on the software packages provided by the MPCDF is available here.
Recommended compiler and MPI stack on Cobra
We currently (as of 2021/07) recommend to use the following versions on Cobra:
module load intel/19.1.3 impi/2019.9 mkl/2020.4
Hierarchical module environment
To manage the plethora of software packages resulting from all the relevant combinations of compilers and MPI libraries, we organize the environment module system for accessing these packages in a natural hierarchical manner. Compilers (gcc, intel) are located on the uppermost level, depending libraries (e.g., MPI) on the second level, more depending libraries on a third level. This means that not all the modules are visible initially: only after loading a compiler module, the modules depending on this will become available. And similarly, loading an MPI module in addition will make the modules depending on the MPI library available.
Starting with the maintenance on Sep 22 2021, no defaults are defined for the compiler and MPI modules, and no modules are loaded automatically at login. This forces users to specify explicit versions for those modules during compilation and in the batch scripts to ensure that the same MPI library is loaded. This also means that users can decide themselves when they use newer compiler and MPI versions for their code which avoids compatibility problems when changing defaults centrally.
For example, the FFTW library compiled with the Intel compiler and the Intel MPI library can be loaded as follows:
First, load the Intel compiler module using the command
module load intel/19.1.3
second, the Intel MPI module with
module load impi/2019.9
and, finally, the FFTW module fitting exactly to the compiler and MPI library via
module load fftw-mpi
You may check by using the command
module avail
that after the first and second steps the depending environment modules become visible, in the present example impi and fftw-mpi. Moreover, note that the environment modules can be loaded via a single ‘module load’ statement as long as the order given by the hierarchy is correct, e.g.,
module load intel/19.1.3 impi/2019.9 fftw-mpi
It is important to point out that a large fraction of the available software is
not affected by the hierarchy, e.g., certain HPC applications, tools such as git
or cmake, mathematical software (maple, matlab, mathematica), visualization
software (visit, paraview, idl) are visible at the uppermost hierarchy. Note
that a hierarchy exists for depending Python modules via the ‘anaconda’ module
files on the top level, and similarly for CUDA via the ‘cuda’ module files. To
start at the root of the environment modules hierarchy, run module purge
.
Because of the hierarchy, some modules only appear after other modules (such as compiler and MPI) have been loaded. One can search all available combinations of a certain software (e.g. fftw-mpi) by using
find-module fftw-mpi
Further information on using environment modules is given here.
Transition to no-default Intel modules in September 2021
Please note that with the Cobra maintenance on Sep 22, 2021, the default-related configuration of the Intel modules was removed, as announced by email on Aug 02, 2021. After that maintenance, no defaults are defined for the Intel compiler and MPI modules, and no modules are loaded automatically at login.
The motivation for introducing these changes is to avoid the accidental use of different versions of Intel compilers and MPI libraries at compile time and at run time. Please note that this will align the configuration on Cobra with the configuration on Raven where users have to specify full versions and no default modules are loaded.
What kind of adaptations of user scripts are necessary? Please load a specific set of environment modules with explicit versions consistently when compiling and running your codes, e.g. use
module purge
module load intel/19.1.3 impi/2019.9 mkl/2020.4
in your job scripts as well as in interactive shell sessions. Note
that you must specify a full version for the ‘intel’ and the ‘impi’
modules, otherwise the command will fail.
Please note that for your convenience, pre-compiled applications provided as
modules like ‘vasp’ or ‘gromacs’ will continue to load the necessary ‘intel’ and
‘impi’ modules automatically, i.e. no changes of the batch scripts are required
for these applications. We do, however, recommend to add a module purge
in
those cases.
Slurm batch system
The batch system on the HPC cluster Cobra is the open-source workload manager Slurm (Simple Linux Utility for Resource management). To run test or production jobs, submit a job script (see below) to Slurm, which will find and allocate the resources required for your job (e.g. the compute nodes to run your job on).
By default, the job run limit is set to 8 on Cobra, the default job submit limit is 300. If your batch jobs can’t run independently from each other, please use job steps or contact the helpdesk on the MPCDF web page.
The Intel processors on Cobra support the hyperthreading mode which might increase the performance of your application by up to 20%. With hyperthreading, you have to increase the number of MPI tasks per node from 40 to 80 in your job script. Please be aware that with 80 MPI tasks per node each process gets only half of the memory by default. If you need more memory, you have to specify it in your job script (see example batch scripts).
If you want to test or debug your code interactively on
cobra-i.mpcdf.mpg.de
(cobra03i-cobra06i), you can use the command:
srun -n N_TASKS -p interactive ./EXECUTABLE
For detailed information about the Slurm batch system, please see Slurm Workload Manager.
Overview of batch queues (partitions) on Cobra:
Partition Processor Max. CPUs Max. Memory Max. Nr. Max. Run
type per Node per Node of Nodes Time
std.| large
-----------------------------------------------------------------------------
tiny Skylake 20 42 GB 0.5 24:00:00
express Skylake 40 / 80 in HT mode 85 | 180 GB 32 30:00
medium Skylake 40 / 80 in HT mode 85 | 180 GB 32 24:00:00
n0064 Skylake 40 / 80 in HT mode 85 | 180 GB 64 24:00:00
n0128 Skylake 40 / 80 in HT mode 85 | 180 GB 128 24:00:00
n0256 Skylake 40 / 80 in HT mode 85 | 180 GB 256 24:00:00
n0512 Skylake 40 / 80 in HT mode 85 | 180 GB 512 24:00:00
n0620 Skylake 40 / 80 in HT mode 85 | 180 GB 620 24:00:00
fat Skylake 40 / 80 in HT mode 748 GB 8 24:00:00
chubby Skylake 40 / 80 in HT mode 368 GB 16 24:00:00
gpu_v100 Skylake 40 / 80 (host cpus) 180 GB 64 24:00:00
gpu1_v100 Skylake 40 / 80 (host cpus) 90 GB 0.5 24:00:00
gpu_rtx5000 Skylake 40 / 80 (host cpus) 180 GB 120 24:00:00
gpu1_rtx5000 Skylake 40 / 80 (host cpus) 90 GB 0.5 24:00:00
Remote visualization:
rvs Skylake 40 / 80 (host cpus) 180 GB 2 24:00:00
The most important Slurm commands are
sbatch <job_script_name>
Submit a job script for executionsqueue
Check the status of your job(s)scancel <job_id>
Cancel a jobsinfo
List the available batch queues (partitions).
Sample Batch job scripts can be found below.
Notes on job scripts:
The directive
SBATCH --nodes=<nr. of nodes>
in your job script sets the number of compute nodes that your program will use.
The directive
SBATCH --ntasks-per-node=<nr. of cpus>
specifies the number of MPI processes for the job. The parameter tasks-per-node can not be greater than 80 because one compute node on Cobra has 40 cores with 2 threads each, thus 80 logical CPUs in hyperthreading mode.
The directive
SBATCH --cpus-per-task=<nr. of OMP threads per MPI task>
specifies the number of threads per MPI process if you are using OpenMP.
The expression
tasks-per-node * cpus-per-task
may not exceed 80.
The expression
nodes * tasks-per-node * cpus-per-task
gives the total number of CPUs that your job will use.
Jobs that need less than a half compute node have to specify a reasonable memory limit so that they can share a node!
A job submit filter will automatically choose the right partition/queue from the resource specification.
Please note that setting the environment variable ‘SLURM_HINT’ in job scripts is not necessary and discouraged on Cobra.
Slurm example batch scripts
MPI and MPI/OpenMP batch scripts
MPI batch job without hyperthreading
#!/bin/bash -l
# Standard output and error:
#SBATCH -o ./tjob.out.%j
#SBATCH -e ./tjob.err.%j
# Initial working directory:
#SBATCH -D ./
# Job Name:
#SBATCH -J test_slurm
#
# Number of nodes and MPI tasks per node:
#SBATCH --nodes=16
#SBATCH --ntasks-per-node=40
#
#SBATCH --mail-type=none
#SBATCH --mail-user=userid@example.mpg.de
#
# Wall clock limit:
#SBATCH --time=24:00:00
# Load compiler and MPI modules with explicit version specifications,
# consistently with the versions used to build the executable.
module purge
module load intel/19.1.3 impi/2019.9
# Run the program:
srun ./myprog > prog.out
Hybrid MPI/OpenMP batch job without hyperthreading
#!/bin/bash -l
# Standard output and error:
#SBATCH -o ./tjob_hybrid.out.%j
#SBATCH -e ./tjob_hybrid.err.%j
# Initial working directory:
#SBATCH -D ./
# Job Name:
#SBATCH -J test_slurm
#
# Number of nodes and MPI tasks per node:
#SBATCH --nodes=16
#SBATCH --ntasks-per-node=4
# for OpenMP:
#SBATCH --cpus-per-task=10
#
#SBATCH --mail-type=none
#SBATCH --mail-user=userid@example.mpg.de
#
# Wall clock limit:
#SBATCH --time=24:00:00
# Load compiler and MPI modules with explicit version specifications,
# consistently with the versions used to build the executable.
module purge
module load intel/19.1.3 impi/2019.9
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
# For pinning threads correctly:
export OMP_PLACES=cores
# Run the program:
srun ./myprog > prog.out
Hybrid MPI/OpenMP batch job in hyperthreading mode
#!/bin/bash -l
# Standard output and error:
#SBATCH -o ./tjob_hybrid.out.%j
#SBATCH -e ./tjob_hybrid.err.%j
# Initial working directory:
#SBATCH -D ./
# Job Name:
#SBATCH -J test_slurm
#
# Number of nodes and MPI tasks per node:
#SBATCH --nodes=16
#SBATCH --ntasks-per-node=4
# Enable Hyperthreading:
#SBATCH --ntasks-per-core=2
# for OpenMP:
#SBATCH --cpus-per-task=20
#
#SBATCH --mail-type=none
#SBATCH --mail-user=userid@example.mpg.de
#
# Wall clock Limit:
#SBATCH --time=24:00:00
# Load compiler and MPI modules with explicit version specifications,
# consistently with the versions used to build the executable.
module purge
module load intel/19.1.3 impi/2019.9
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
# For pinning threads correctly:
export OMP_PLACES=threads
# Run the program:
srun ./myprog > prog.out
MPI batch job in hyperthreading mode using 180 gb of memory per node
#!/bin/bash -l
# Standard output and error:
#SBATCH -o ./tjob.out.%j
#SBATCH -e ./tjob.err.%j
# Initial working directory:
#SBATCH -D ./
# Job Name :
#SBATCH -J test_slurm
#
# Number of nodes and MPI tasks per node:
#SBATCH --nodes=16
#SBATCH --ntasks-per-node=80
# Enable Hyperthreading:
#SBATCH --ntasks-per-core=2
#
# Request 180 GB of main memory per node in units of MB:
#SBATCH --mem=185000
#
#SBATCH --mail-type=none
#SBATCH --mail-user=userid@example.mpg.de
#
# Wall clock limit:
#SBATCH --time=24:00:00
# Load compiler and MPI modules with explicit version specifications,
# consistently with the versions used to build the executable.
module purge
module load intel/19.1.3 impi/2019.9
# enable over-subscription of physical cores by MPI ranks
export PSM2_MULTI_EP=0
# Run the program:
srun ./myprog > prog.out
OpenMP batch job in hyperthreading mode using 180 gb of memory per node
#!/bin/bash -l
# Standard output and error:
#SBATCH -o ./tjob_hybrid.out.%j
#SBATCH -e ./tjob_hybrid.err.%j
# Initial working directory:
#SBATCH -D ./
# Job Name:
#SBATCH -J test_slurm
#
# Number of nodes and MPI tasks per node:
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
# Enable Hyperthreading:
#SBATCH --ntasks-per-core=2
# for OpenMP:
#SBATCH --cpus-per-task=80
#
# Request 180 GB of main memory per node in units of MB:
#SBATCH --mem=185000
#
#SBATCH --mail-type=none
#SBATCH --mail-user=userid@example.mpg.de
#
# Wall clock Limit:
#SBATCH --time=24:00:00
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
# For pinning threads correctly
export OMP_PLACES=threads
# Run the program:
srun ./myprog > prog.out
Batch jobs using GPUs
MPI batch job on GPUs
#!/bin/bash -l
# Standard output and error:
#SBATCH -o ./tjob.out.%j
#SBATCH -e ./tjob.err.%j
# Initial working directory:
#SBATCH -D ./
#
#SBATCH -J test_slurm
#
# Node feature:
#SBATCH --constraint="gpu"
# Specify type and number of GPUs to use:
# GPU type can be v100 or rtx5000
#SBATCH --gres=gpu:v100:2 # If using both GPUs of a node
# #SBATCH --gres=gpu:v100:1 # If using only 1 GPU of a shared node
# #SBATCH --mem=92500 # Memory is necessary if using only 1 GPU
#
# Number of nodes and MPI tasks per node:
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=40 # If using both GPUs of a node
# #SBATCH --ntasks-per-node=20 # If using only 1 GPU of a shared node
#
#SBATCH --mail-type=none
#SBATCH --mail-user=userid@example.mpg.de
#
# wall clock limit:
#SBATCH --time=24:00:00
# Load compiler and MPI modules with explicit version specifications,
# consistently with the versions used to build the executable.
module purge
module load intel/19.1.3 impi/2019.9
module load cuda/11.2
# Run the program:
srun ./my_gpu_prog > prog.out
Batch jobs with dependencies
The following script generates a sequence of jobs, each job running the given
job script. The start of each individual job depends on its dependency, where
possible values for the --dependency
flag are, e.g.
afterany:job_id
This job starts after the previous job has terminatedafterok:job_id
This job starts after previous job has successfully executed
#!/bin/bash
# Submit a sequence of batch jobs with dependencies
#
# Number of jobs to submit:
NR_OF_JOBS=6
# Batch job script:
JOB_SCRIPT=./my_batch_script
echo "Submitting job chain of ${NR_OF_JOBS} jobs for batch script ${JOB_SCRIPT}:"
JOBID=$(sbatch ${JOB_SCRIPT} 2>&1 | awk '{print $(NF)}')
echo " " ${JOBID}
I=1
while [ ${I} -lt ${NR_OF_JOBS} ]; do
JOBID=$(sbatch --dependency=afterany:${JOBID} ${JOB_SCRIPT} 2>&1 | awk '{print $(NF)}')
echo " " ${JOBID}
let I=${I}+1
done
Batch job using a job array
#!/bin/bash -l
#SBATCH --array=1-20 # specify the indexes of the job array elements
# Standard output and error:
#SBATCH -o job_%A_%a.out # Standard output, %A = job ID, %a = job array index
#SBATCH -e job_%A_%a.err # Standard error, %A = job ID, %a = job array index
# Initial working directory:
#SBATCH -D ./
# Job Name:
#SBATCH -J test_array
#
# Number of nodes and MPI tasks per node:
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=40
#
#SBATCH --mail-type=none
#SBATCH --mail-user=userid@example.mpg.de
#
# Wall clock limit:
#SBATCH --time=24:00:00
# Load compiler and MPI modules with explicit version specifications,
# consistently with the versions used to build the executable.
module purge
module load intel/19.1.3 impi/2019.9
# The environment variable $SLURM_ARRAY_TASK_ID holds the index of the job array and
# can be used to discriminate between individual elements of the job array:
srun ./myprog $SLURM_ARRAY_TASK_ID >prog.out
Single-node example job scripts for sequential programs, plain-OpenMP cases, Python, Julia, Matlab
In the following, example job scripts are given for jobs that use at maximum one full node. Use cases are sequential programs, threaded programs using OpenMP or similar models, and programs written in languages such as Python, Julia, Matlab, etc.
The Python example programs referred to below are available for download.
Single-core job
#!/bin/bash -l
#
# Single-core example job script for MPCDF Cobra.
# In addition to the Python example shown here, the script
# is valid for any single-threaded program, including
# sequential Matlab, Mathematica, Julia, and similar cases.
#
#SBATCH -J PYTHON_SEQ
#SBATCH -o ./out.%j
#SBATCH -e ./err.%j
#SBATCH -D ./
#SBATCH --ntasks=1 # launch job on a single core
#SBATCH --cpus-per-task=1 # on a shared node
#SBATCH --mem=2000MB # memory limit for the job
#SBATCH --time=0:10:00
module purge
module load gcc/10 impi/2019.9
module load anaconda/3/2021.05
# Set number of OMP threads to fit the number of available cpus, if applicable.
export OMP_NUM_THREADS=1
# Run single-core program
srun python3 ./python_sequential.py
Small job with multithreading, applicable to Python, Julia and Matlab, plain OpenMP, or any threaded application
#!/bin/bash -l
#
# Multithreading example job script for MPCDF Cobra.
# In addition to the Python example shown here, the script
# is valid for any multi-threaded program, including
# Matlab, Mathematica, Julia, and similar cases.
#
#SBATCH -J PYTHON_MT
#SBATCH -o ./out.%j
#SBATCH -e ./err.%j
#SBATCH -D ./
#SBATCH --ntasks=1 # launch job on
#SBATCH --cpus-per-task=8 # 8 cores on a shared node
#SBATCH --mem=16000MB # memory limit for the job
#SBATCH --time=0:10:00
module purge
module load gcc/10 impi/2019.9
module load anaconda/3/2021.05
# Set number of OMP threads to fit the number of available cpus, if applicable.
export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK}
srun python3 ./python_multithreading.py
Python/NumPy multitheading, applicable to Julia and Matlab, plain OpenMP, or any threaded application
#!/bin/bash -l
#
# Multithreading example job script for MPCDF Cobra.
# In addition to the Python example shown here, the script
# is valid for any multi-threaded program, including
# plain OpenMP, parallel Matlab, Julia, and similar cases.
#
#SBATCH -o ./out.%j
#SBATCH -e ./err.%j
#SBATCH -D ./
#SBATCH -J PY_MULTITHREADING
#SBATCH --nodes=1 # request a full node
#SBATCH --ntasks-per-node=1 # only start 1 task via srun because Python multiprocessing starts more tasks internally
#SBATCH --cpus-per-task=40 # assign all the cores to that first task to make room for multithreading
#SBATCH --time=00:10:00
module purge
module load gcc/10 impi/2019.9
module load anaconda/3/2021.05
# set number of OMP threads *per process*
export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK}
srun python3 ./python_multithreading.py
Python multiprocessing
#!/bin/bash -l
#
# Python multiprocessing example job script for MPCDF Cobra.
#
#SBATCH -o ./out.%j
#SBATCH -e ./err.%j
#SBATCH -D ./
#SBATCH -J PYTHON_MP
#SBATCH --nodes=1 # request a full node
#SBATCH --ntasks-per-node=1 # only start 1 task via srun because Python multiprocessing starts more tasks internally
#SBATCH --cpus-per-task=40 # assign all the cores to that first task to make room for Python's multiprocessing tasks
#SBATCH --time=00:10:00
module purge
module load gcc/10 impi/2019.9
module load anaconda/3/2021.05
# Important:
# Set the number of OMP threads *per process* to avoid overloading of the node!
export OMP_NUM_THREADS=1
# Use the environment variable SLURM_CPUS_PER_TASK to have multiprocessing
# spawn exactly as many processes as you have CPUs available.
srun python3 ./python_multiprocessing.py $SLURM_CPUS_PER_TASK
Python mpi4py
#!/bin/bash -l
#
# Python MPI4PY example job script for MPCDF Cobra.
# Plain MPI. May use more than one node.
#
#SBATCH -o ./out.%j
#SBATCH -e ./err.%j
#SBATCH -D ./
#SBATCH -J MPI4PY
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=40
#SBATCH --time=00:10:00
module purge
module load gcc/10 impi/2019.9
module load anaconda/3/2021.05
module load mpi4py/3.0.3
# Important:
# Set the number of OMP threads *per process* to avoid overloading of the node!
export OMP_NUM_THREADS=1
srun python3 ./python_mpi4py.py