Cobra User Guide

System Overview

The Supercomputer Cobra was installed in spring 2018, got expanded with NVIDIA Tesla V100 GPUs in Dec 2018 and with NVIDIA Quadro RTX 5000 GPUs in July 2019.

All compute nodes contain two Intel Xeon Gold 6148 processors (Skylake (SKL), 20 cores @ 2.4 GHz) and are connected through a 100 Gb/s OmniPath interconnect. Each island (~ 636 nodes) has a non-blocking, full fat tree network topology, while among islands a blocking factor of 1:8 applies. Therefore batch jobs are restricted to a single island. In addition there are 6 login nodes and an I/O subsystem that serves 5 PetaByte of disk storage with direct HSM access (via GHI).

Overall configuration

  • 1284 compute nodes (2 × SKL), 96 GB RAM DDR4 each

  • 1908 compute nodes (2 × SKL), 192 GB RAM DDR4 each

  • 16 compute nodes (2 × SKL), 384 GB RAM DDR4 each

  • 8 compute nodes (2 × SKL), 768 GB RAM DDR4 each

  • 64 compute nodes (2 × SKL + 2 × NVIDIA Tesla V100-32)

  • 120 compute nodes (2 × SKL + 2 × NVIDIA Quadro RTX 5000)

  • 24 compute nodes (2 × SKL), 192 GB RAM DDR4 each (dedicated to MPSD)

Summary

3424 compute nodes, 136,960 CPU-cores, 128 Tesla V100-32 GPUs, 240 Quadro RTX 5000 GPUs, 529 TB RAM DDR4, 7.9 TB HBM2, 11.4 PFlop/s peak DP, 2.64 PFlop/s peak SP

MPCDF Cobra

Access

Login

For security reasons, direct login to the HPC cluster Cobra is allowed only from within the MPG networks. Users from other locations have to login to one of our gateway systems first. Use ssh to connect to Cobra:

ssh cobra.mpcdf.mpg.de

You will be directed to one of the Cobra login nodes (cobra01i, cobra02i). You have to provide your (Kerberos) password and an OTP on the Cobra login nodes. SSH keys are not allowed.

Secure copy (scp) can be used to transfer data to or from cobra.mpcdf.mpg.de

Cobra’s (all login/interactive nodes) ssh key fingerprints (SHA256) are:

G45rl+n9MWi/TWQA3bYXoVxBI/wiOviJXe99H4SacWU (RSA)
KcGJxKBfrsVyexByJFgbuFDigfvGfrgZ5Urvmh/ZJLI (ED25519)

Using compute resources

The pool of login nodes cobra.mpcdf.mpg.de is mainly intended for editing, compiling and submitting your parallel programs. Running parallel programs interactively in production mode on the login nodes is not allowed. Jobs have to be submitted to the Slurm batch system which reserves and allocates the resources (e.g. compute nodes) required for your job. Further information on the batch system is provided below.

Interactive (debug) runs

If you need to test or debug your code, you may login to ‘cobra-i.mpcdf.mpg.de’ (cobra03i-cobra06i) and run your code interactively (2 hours at most) with the command:

srun -n NUMBER_OF_CORES -p interactive --time=TIME_LESS_THAN_2HOURS --mem=MEMORY_LESS_THAN_32G ./EXECUTABLE

But please, take care that the machine does not become overloaded. Don’t use more than 8 cores in total and do not request more than 32 GB of main memory. Neglecting these recommendations may cause a system crash or hangup!

Internet access

Connections to the Internet are only permitted from the login nodes in outgoing direction; Internet access from within batch jobs is not possible. To download source code or other data, command line tools such as wget, curl, rsync, scp, pip, git, or similar may be used interactively on the login nodes. In case the transfer is expected to take a long time it is useful to run it inside a screen or tmux session.

Hardware configuration

Compute nodes

  • 3240 compute nodes

  • Processor type: Intel Skylake 6148

  • Processor clock: 2.4 GHz

  • Theoretical peak performance per node: 2.4 GHz * 32 DP Flops/cycle * 40 = 3072 DP GFlop/s

  • Cores per node: 40 (each with 2 hyperthreads, thus 80 logical CPUs per node)

  • Node topology: 2 NUMA domains with 20 physical cores each

  • Main memory

    • standard nodes: 1284 × 96 GB

    • large memory nodes: 1932 × 192 GB

    • very large memory nodes: 16 × 384 GB, 8 × 768 GB

Accelerator part of Cobra:

  • 64 nodes, each hosting 2 V100 GPUs (Tesla V100-PCIE-32GB: 32 GB HBM2, 5120 CUDA cores + 640 Tensor cores @ 1380 MHz, compute capability 7.0 / “Volta”)

  • 120 nodes, each hosting 2 RTX5000 GPUs (Quadro RTX 5000: 16 GB GDDR6, 3072 CUDA cores + 384 Tensor cores + 48 RT units @ 1935 MHz, compute capability 7.5 / “Turing”)

Login and interactive nodes

  • 2 nodes for login (Hostname cobra.mpcdf.mpg.de)

  • 4 nodes for interactive program development and testing (Hostname cobra-i.mpcdf.mpg.de)

  • Main memory: 4 × 192 GB

Batch access is possible via the Slurm batch system from the login nodes cobra.mpcdf.mpg.de and cobra-i.mpcdf.mpg.de.

Interconnect

  • fast OmniPath (100 Gb/s) network connecting all the nodes

The compute nodes and GPU nodes are bundled into 6 domains (islands) with 636 nodes (or 64 nodes in case of GPU island), each. Within one domain, the OmniPath network topology is a ‘fat tree’ topology for highly efficient communication. The OmniPath connection between the islands is much weaker, so batch jobs are restricted to a single island, that is 636 nodes.

I/O subsystem

  • 8 I/O nodes

  • 5 PB of online disk space

File systems

$HOME

Your home directory is in the GPFS file system /u (see below).

AFS

AFS is only available on the login nodes cobra.mpcdf.mpg.de and on the interactive nodes cobra-i.mpcdf.mpg.de in order to access software that is distributed by AFS. If you don’t get automatically an AFS token during login, you can get an AFS token with the command /usr/bin/klog.krb5. Note that there is no AFS on the compute nodes, so you have to avoid any dependencies on AFS in your job.

GPFS

There are two global, parallel file systems of type GPFS (/u and /ptmp), symmetrically accessible from all Cobra cluster nodes, plus the migrating file system /r interfacing to the HPSS archive system.

File system /u

The file system /u (a symbolic link to /cobra/u) is designed for permanent user data such as source files, config files, etc. The size of /u is 0.6 PB mirrored (RAID 6). Note that no system backups are performed. Your home directory is in /u. The default disk quota in /u is 2.5 TB, the file quota is 2 mio files. You can check your disk quota in /u with the command:

/usr/lpp/mmfs/bin/mmlsquota cobra_u

File system /ptmp

The file system /ptmp (a symbolic link to /cobra/ptmp) is designed for batch job I/O (4.5 PB mirrored, RAID 6, no system backups). Files in /ptmp that have not been accessed for more than 12 weeks will be removed automatically. The period of 12 weeks may be reduced if necessary (with prior notification).

As a current policy, no quotas are applied on /ptmp. This gives users the freedom to manage their data according to their actual needs without administrative overhead. This liberal policy presumes a fair usage of the common file space. So, please do a regular housekeeping of your data and archive/remove files that are not used actually.

Archiving data from the GPFS file systems to tape can be done using the migrating file system /r (see below).

File system /r

The /r file system (a symbolic link to /ghi/r) stages archive data. It is available only on the login nodes cobra.mpcdf.mpg.de and on the interactive nodes cobra-i.mpcdf.mpg.de.

Each user has a subdirectory /r/*initial*/*userid* to store data. For efficiency, files should be packed to tar files (with a size of about 1 GB to 1 TB) before archiving them in /r, i.e., please avoid archiving small files. When the file system /r gets filled above a certain value, files will be transferred from disk to tape, beginning with the largest files which have not been used for the longest time.

For documentation on how to use the MPCDF archive system, please see the backup and archive section.

/tmp

Please, don’t use the file system /tmp for scratch data. Instead, use /ptmp which is accessible from all Cobra cluster nodes. In cases where an application really depends on node-local storage, you can use the variables JOB_TMPDIR and JOB_SHMTMPDIR, which are set individually for each job.

Software

Access to software via environment modules

Environment modules are used at MPCDF to provide software packages and enable switching between different software versions.

Use the command

module avail

to list the available software packages on the HPC system. Note that you can search for a certain module by using the find-module tool (see below).

Use the command

module load package_name/version

to actually load a software package at a specific version.

Further information on the environment modules on Cobra and their hierarchical organization is given below.

Information on the software packages provided by the MPCDF is available here.

Hierarchical module environment

To manage the plethora of software packages resulting from all the relevant combinations of compilers and MPI libraries, we organize the environment module system for accessing these packages in a natural hierarchical manner. Compilers (gcc, intel) are located on the uppermost level, depending libraries (e.g., MPI) on the second level, more depending libraries on a third level. This means that not all the modules are visible initially: only after loading a compiler module, the modules depending on this will become available. And similarly, loading an MPI module in addition will make the modules depending on the MPI library available.

Starting with the maintenance on Sep 22 2021, no defaults are defined for the compiler and MPI modules, and no modules are loaded automatically at login. This forces users to specify explicit versions for those modules during compilation and in the batch scripts to ensure that the same MPI library is loaded. This also means that users can decide themselves when they use newer compiler and MPI versions for their code which avoids compatibility problems when changing defaults centrally.

For example, the FFTW library compiled with the Intel compiler and the Intel MPI library can be loaded as follows:

First, load the Intel compiler module using the command

module load intel/19.1.3

second, the Intel MPI module with

module load impi/2019.9

and, finally, the FFTW module fitting exactly to the compiler and MPI library via

module load fftw-mpi

You may check by using the command

module avail

that after the first and second steps the depending environment modules become visible, in the present example impi and fftw-mpi. Moreover, note that the environment modules can be loaded via a single ‘module load’ statement as long as the order given by the hierarchy is correct, e.g.,

module load intel/19.1.3 impi/2019.9 fftw-mpi

It is important to point out that a large fraction of the available software is not affected by the hierarchy, e.g., certain HPC applications, tools such as git or cmake, mathematical software (maple, matlab, mathematica), visualization software (visit, paraview, idl) are visible at the uppermost hierarchy. Note that a hierarchy exists for depending Python modules via the ‘anaconda’ module files on the top level, and similarly for CUDA via the ‘cuda’ module files. To start at the root of the environment modules hierarchy, run module purge.

Because of the hierarchy, some modules only appear after other modules (such as compiler and MPI) have been loaded. One can search all available combinations of a certain software (e.g. fftw-mpi) by using

find-module fftw-mpi

Further information on using environment modules is given here.

Transition to no-default Intel modules in September 2021

Please note that with the Cobra maintenance on Sep 22, 2021, the default-related configuration of the Intel modules was removed, as announced by email on Aug 02, 2021. After that maintenance, no defaults are defined for the Intel compiler and MPI modules, and no modules are loaded automatically at login.

The motivation for introducing these changes is to avoid the accidental use of different versions of Intel compilers and MPI libraries at compile time and at run time. Please note that this will align the configuration on Cobra with the configuration on Raven where users have to specify full versions and no default modules are loaded.

What kind of adaptations of user scripts are necessary? Please load a specific set of environment modules with explicit versions consistently when compiling and running your codes, e.g. use

module purge
module load intel/19.1.3 impi/2019.9 mkl/2020.4

in your job scripts as well as in interactive shell sessions. Note that you must specify a full version for the ‘intel’ and the ‘impi’ modules, otherwise the command will fail. Please note that for your convenience, pre-compiled applications provided as modules like ‘vasp’ or ‘gromacs’ will continue to load the necessary ‘intel’ and ‘impi’ modules automatically, i.e. no changes of the batch scripts are required for these applications. We do, however, recommend to add a module purge in those cases.

Slurm batch system

The batch system on the HPC cluster Cobra is the open-source workload manager Slurm (Simple Linux Utility for Resource management). To run test or production jobs, submit a job script (see below) to Slurm, which will find and allocate the resources required for your job (e.g. the compute nodes to run your job on).

By default, the job run limit is set to 8 on Cobra, the default job submit limit is 300. If your batch jobs can’t run independently from each other, please use job steps or contact the helpdesk on the MPCDF web page.

The Intel processors on Cobra support the hyperthreading mode which might increase the performance of your application by up to 20%. With hyperthreading, you have to increase the number of MPI tasks per node from 40 to 80 in your job script. Please be aware that with 80 MPI tasks per node each process gets only half of the memory by default. If you need more memory, you have to specify it in your job script (see example batch scripts).

If you want to test or debug your code interactively on cobra-i.mpcdf.mpg.de (cobra03i-cobra06i), you can use the command:

srun -n N_TASKS -p interactive ./EXECUTABLE

For detailed information about the Slurm batch system, please see Slurm Workload Manager.

Overview of batch queues (partitions) on Cobra:

    Partition   Processor   Max. CPUs          Max. Memory    Max. Nr.  Max. Run

                  type       per Node            per Node     of Nodes    Time

                                                std.| large

    -----------------------------------------------------------------------------

    tiny         Skylake    20                    42 GB           0.5    24:00:00

    express      Skylake    40 / 80 in HT mode    85 | 180 GB     32        30:00

    medium       Skylake    40 / 80 in HT mode    85 | 180 GB     32     24:00:00

    n0064        Skylake    40 / 80 in HT mode    85 | 180 GB     64     24:00:00

    n0128        Skylake    40 / 80 in HT mode    85 | 180 GB    128     24:00:00

    n0256        Skylake    40 / 80 in HT mode    85 | 180 GB    256     24:00:00

    n0512        Skylake    40 / 80 in HT mode    85 | 180 GB    512     24:00:00

    n0620        Skylake    40 / 80 in HT mode    85 | 180 GB    620     24:00:00

    fat          Skylake    40 / 80 in HT mode         748 GB      8     24:00:00

    chubby       Skylake    40 / 80 in HT mode         368 GB     16     24:00:00

    gpu_v100     Skylake    40 / 80 (host cpus)        180 GB     64     24:00:00

    gpu1_v100    Skylake    40 / 80 (host cpus)         90 GB     0.5    24:00:00

    gpu_rtx5000  Skylake    40 / 80 (host cpus)        180 GB    120     24:00:00

    gpu1_rtx5000 Skylake    40 / 80 (host cpus)         90 GB     0.5    24:00:00

    Remote visualization:

    rvs          Skylake    40 / 80 (host cpus)        180 GB      2     24:00:00

The most important Slurm commands are

  • sbatch <job_script_name> Submit a job script for execution

  • squeue Check the status of your job(s)

  • scancel <job_id> Cancel a job

  • sinfo List the available batch queues (partitions).

Sample Batch job scripts can be found below.

Notes on job scripts:

  • The directive

    SBATCH --nodes=<nr. of nodes>
    

    in your job script sets the number of compute nodes that your program will use.

  • The directive

    SBATCH --ntasks-per-node=<nr. of cpus>
    

    specifies the number of MPI processes for the job. The parameter tasks-per-node can not be greater than 80 because one compute node on Cobra has 40 cores with 2 threads each, thus 80 logical CPUs in hyperthreading mode.

  • The directive

    SBATCH --cpus-per-task=<nr. of OMP threads per MPI task>
    

    specifies the number of threads per MPI process if you are using OpenMP.

  • The expression

    tasks-per-node * cpus-per-task
    

    may not exceed 80.

  • The expression

    nodes * tasks-per-node * cpus-per-task
    

    gives the total number of CPUs that your job will use.

  • Jobs that need less than a half compute node have to specify a reasonable memory limit so that they can share a node!

  • A job submit filter will automatically choose the right partition/queue from the resource specification.

  • Please note that setting the environment variable ‘SLURM_HINT’ in job scripts is not necessary and discouraged on Cobra.

Slurm example batch scripts

MPI and MPI/OpenMP batch scripts

MPI batch job without hyperthreading

#!/bin/bash -l
# Standard output and error:
#SBATCH -o ./tjob.out.%j
#SBATCH -e ./tjob.err.%j
# Initial working directory:
#SBATCH -D ./
# Job Name:
#SBATCH -J test_slurm
#
# Number of nodes and MPI tasks per node:
#SBATCH --nodes=16
#SBATCH --ntasks-per-node=40
#
#SBATCH --mail-type=none
#SBATCH --mail-user=userid@example.mpg.de
#
# Wall clock limit:
#SBATCH --time=24:00:00

# Load compiler and MPI modules with explicit version specifications,
# consistently with the versions used to build the executable.
module purge
module load intel/19.1.3 impi/2019.9

# Run the program:
srun ./myprog > prog.out

Hybrid MPI/OpenMP batch job without hyperthreading

#!/bin/bash -l
# Standard output and error:
#SBATCH -o ./tjob_hybrid.out.%j
#SBATCH -e ./tjob_hybrid.err.%j
# Initial working directory:
#SBATCH -D ./
# Job Name:
#SBATCH -J test_slurm
#
# Number of nodes and MPI tasks per node:
#SBATCH --nodes=16
#SBATCH --ntasks-per-node=4
# for OpenMP:
#SBATCH --cpus-per-task=10
#
#SBATCH --mail-type=none
#SBATCH --mail-user=userid@example.mpg.de
#
# Wall clock limit:
#SBATCH --time=24:00:00

# Load compiler and MPI modules with explicit version specifications,
# consistently with the versions used to build the executable.
module purge
module load intel/19.1.3 impi/2019.9

export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
# For pinning threads correctly:
export OMP_PLACES=cores

# Run the program:
srun ./myprog > prog.out

Hybrid MPI/OpenMP batch job in hyperthreading mode

#!/bin/bash -l
# Standard output and error:
#SBATCH -o ./tjob_hybrid.out.%j
#SBATCH -e ./tjob_hybrid.err.%j
# Initial working directory:
#SBATCH -D ./
# Job Name:
#SBATCH -J test_slurm
#
# Number of nodes and MPI tasks per node:
#SBATCH --nodes=16
#SBATCH --ntasks-per-node=4
# Enable Hyperthreading:
#SBATCH --ntasks-per-core=2
# for OpenMP:
#SBATCH --cpus-per-task=20
#
#SBATCH --mail-type=none
#SBATCH --mail-user=userid@example.mpg.de
#
# Wall clock Limit:
#SBATCH --time=24:00:00

# Load compiler and MPI modules with explicit version specifications,
# consistently with the versions used to build the executable.
module purge
module load intel/19.1.3 impi/2019.9

export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
# For pinning threads correctly:
export OMP_PLACES=threads

# Run the program:
srun ./myprog > prog.out

MPI batch job in hyperthreading mode using 180 gb of memory per node

#!/bin/bash -l
# Standard output and error:
#SBATCH -o ./tjob.out.%j
#SBATCH -e ./tjob.err.%j
# Initial working directory:
#SBATCH -D ./
# Job Name :
#SBATCH -J test_slurm
#
# Number of nodes and MPI tasks per node:
#SBATCH --nodes=16
#SBATCH --ntasks-per-node=80
# Enable Hyperthreading:
#SBATCH --ntasks-per-core=2
#
# Request 180 GB of main memory per node in units of MB:
#SBATCH --mem=185000
#
#SBATCH --mail-type=none
#SBATCH --mail-user=userid@example.mpg.de
#
# Wall clock limit:
#SBATCH --time=24:00:00

# Load compiler and MPI modules with explicit version specifications,
# consistently with the versions used to build the executable.
module purge
module load intel/19.1.3 impi/2019.9

# enable over-subscription of physical cores by MPI ranks
export PSM2_MULTI_EP=0

# Run the program:
srun ./myprog > prog.out

OpenMP batch job in hyperthreading mode using 180 gb of memory per node

#!/bin/bash -l
# Standard output and error:
#SBATCH -o ./tjob_hybrid.out.%j
#SBATCH -e ./tjob_hybrid.err.%j
# Initial working directory:
#SBATCH -D ./
# Job Name:
#SBATCH -J test_slurm
#
# Number of nodes and MPI tasks per node:
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
# Enable Hyperthreading:
#SBATCH --ntasks-per-core=2
# for OpenMP:
#SBATCH --cpus-per-task=80
#
# Request 180 GB of main memory per node in units of MB:
#SBATCH --mem=185000
#
#SBATCH --mail-type=none
#SBATCH --mail-user=userid@example.mpg.de
#
# Wall clock Limit:
#SBATCH --time=24:00:00

export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
# For pinning threads correctly
export OMP_PLACES=threads

# Run the program:
srun ./myprog > prog.out

Small mpi batch job on 1 - 20 cores (using a shared node)

#!/bin/bash -l
# Standard output and error:
#SBATCH -o ./tjob.out.%j
#SBATCH -e ./tjob.err.%j
# Initial working directory:
#SBATCH -D ./
# Job Name:
#SBATCH -J test_slurm
#
# Number of MPI Tasks, e.g. 8:
#SBATCH --ntasks=8
#SBATCH --ntasks-per-core=1
# Memory usage [MB] of the job is required, 2200 MB per task:
#SBATCH --mem=17600
#
#SBATCH --mail-type=none
#SBATCH --mail-user=userid@example.mpg.de
#
# Wall clock limit:
#SBATCH --time=24:00:00

# Load compiler and MPI modules with explicit version specifications,
# consistently with the versions used to build the executable.
module purge
module load intel/19.1.3 impi/2019.9

# Run the program:
srun ./myprog > prog.out

Batch jobs using GPUs

MPI batch job on GPUs

#!/bin/bash -l
# Standard output and error:
#SBATCH -o ./tjob.out.%j
#SBATCH -e ./tjob.err.%j
# Initial working directory:
#SBATCH -D ./
#
#SBATCH -J test_slurm
#
# Node feature:
#SBATCH --constraint="gpu"
# Specify type and number of GPUs to use:
#   GPU type can be v100 or rtx5000
#SBATCH --gres=gpu:v100:2         # If using both GPUs of a node
# #SBATCH --gres=gpu:v100:1       # If using only 1 GPU of a shared node
# #SBATCH --mem=92500             # Memory is necessary if using only 1 GPU
#
# Number of nodes and MPI tasks per node:
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=40      # If using both GPUs of a node
# #SBATCH --ntasks-per-node=20    # If using only 1 GPU of a shared node
#
#SBATCH --mail-type=none
#SBATCH --mail-user=userid@example.mpg.de
#
# wall clock limit:
#SBATCH --time=24:00:00

# Load compiler and MPI modules with explicit version specifications,
# consistently with the versions used to build the executable.
module purge
module load intel/19.1.3 impi/2019.9
module load cuda/11.2

# Run the program:
srun ./my_gpu_prog > prog.out

Batch jobs with dependencies

The following script generates a sequence of jobs, each job running the given job script. The start of each individual job depends on its dependency, where possible values for the --dependency flag are, e.g.

  • afterany:job_id This job starts after the previous job has terminated

  • afterok:job_id This job starts after previous job has successfully executed

#!/bin/bash
# Submit a sequence of batch jobs with dependencies
#
# Number of jobs to submit:
NR_OF_JOBS=6
# Batch job script:
JOB_SCRIPT=./my_batch_script
echo "Submitting job chain of ${NR_OF_JOBS} jobs for batch script ${JOB_SCRIPT}:"
JOBID=$(sbatch ${JOB_SCRIPT} 2>&1 | awk '{print $(NF)}')
echo "  " ${JOBID}
I=1
while [ ${I} -lt ${NR_OF_JOBS} ]; do
  JOBID=$(sbatch --dependency=afterany:${JOBID} ${JOB_SCRIPT} 2>&1 | awk '{print $(NF)}')
  echo "  " ${JOBID}
  let I=${I}+1
done

Batch job using a job array

#!/bin/bash -l
#SBATCH --array=1-20            # specify the indexes of the job array elements
# Standard output and error:
#SBATCH -o job_%A_%a.out        # Standard output, %A = job ID, %a = job array index
#SBATCH -e job_%A_%a.err        # Standard error, %A = job ID, %a = job array index
# Initial working directory:
#SBATCH -D ./
# Job Name:
#SBATCH -J test_array
#
# Number of nodes and MPI tasks per node:
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=40
#
#SBATCH --mail-type=none
#SBATCH --mail-user=userid@example.mpg.de
#
# Wall clock limit:
#SBATCH --time=24:00:00

# Load compiler and MPI modules with explicit version specifications,
# consistently with the versions used to build the executable.
module purge
module load intel/19.1.3 impi/2019.9

#  The environment variable $SLURM_ARRAY_TASK_ID holds the index of the job array and
#  can be used to discriminate between individual elements of the job array:
srun ./myprog $SLURM_ARRAY_TASK_ID >prog.out

Single-node example job scripts for sequential programs, plain-OpenMP cases, Python, Julia, Matlab

In the following, example job scripts are given for jobs that use at maximum one full node. Use cases are sequential programs, threaded programs using OpenMP or similar models, and programs written in languages such as Python, Julia, Matlab, etc.

The Python example programs referred to below are available for download.

Single-core job

#!/bin/bash -l
#
# Single-core example job script for MPCDF Cobra.
# In addition to the Python example shown here, the script
# is valid for any single-threaded program, including
# sequential Matlab, Mathematica, Julia, and similar cases.
#
#SBATCH -J PYTHON_SEQ
#SBATCH -o ./out.%j
#SBATCH -e ./err.%j
#SBATCH -D ./
#SBATCH --ntasks=1         # launch job on a single core
#SBATCH --cpus-per-task=1  #   on a shared node
#SBATCH --mem=2000MB       # memory limit for the job
#SBATCH --time=0:10:00

module purge
module load gcc/10 impi/2019.9
module load anaconda/3/2021.05

# Set number of OMP threads to fit the number of available cpus, if applicable.
export OMP_NUM_THREADS=1

# Run single-core program
srun python3 ./python_sequential.py

Small job with multithreading, applicable to Python, Julia and Matlab, plain OpenMP, or any threaded application

#!/bin/bash -l
#
# Multithreading example job script for MPCDF Cobra.
# In addition to the Python example shown here, the script
# is valid for any multi-threaded program, including
# Matlab, Mathematica, Julia, and similar cases.
#
#SBATCH -J PYTHON_MT
#SBATCH -o ./out.%j
#SBATCH -e ./err.%j
#SBATCH -D ./
#SBATCH --ntasks=1         # launch job on
#SBATCH --cpus-per-task=8  #   8 cores on a shared node
#SBATCH --mem=16000MB      # memory limit for the job
#SBATCH --time=0:10:00

module purge
module load gcc/10 impi/2019.9
module load anaconda/3/2021.05

# Set number of OMP threads to fit the number of available cpus, if applicable.
export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK}

srun python3 ./python_multithreading.py

Python/NumPy multitheading, applicable to Julia and Matlab, plain OpenMP, or any threaded application

#!/bin/bash -l
#
# Multithreading example job script for MPCDF Cobra.
# In addition to the Python example shown here, the script
# is valid for any multi-threaded program, including
# plain OpenMP, parallel Matlab, Julia, and similar cases.
#
#SBATCH -o ./out.%j
#SBATCH -e ./err.%j
#SBATCH -D ./
#SBATCH -J PY_MULTITHREADING
#SBATCH --nodes=1             # request a full node
#SBATCH --ntasks-per-node=1   # only start 1 task via srun because Python multiprocessing starts more tasks internally
#SBATCH --cpus-per-task=40    # assign all the cores to that first task to make room for multithreading
#SBATCH --time=00:10:00

module purge
module load gcc/10 impi/2019.9
module load anaconda/3/2021.05

# set number of OMP threads *per process*
export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK}

srun python3 ./python_multithreading.py

Python multiprocessing

#!/bin/bash -l
#
# Python multiprocessing example job script for MPCDF Cobra.
#
#SBATCH -o ./out.%j
#SBATCH -e ./err.%j
#SBATCH -D ./
#SBATCH -J PYTHON_MP
#SBATCH --nodes=1             # request a full node
#SBATCH --ntasks-per-node=1   # only start 1 task via srun because Python multiprocessing starts more tasks internally
#SBATCH --cpus-per-task=40    # assign all the cores to that first task to make room for Python's multiprocessing tasks
#SBATCH --time=00:10:00

module purge
module load gcc/10 impi/2019.9
module load anaconda/3/2021.05

# Important:
# Set the number of OMP threads *per process* to avoid overloading of the node!
export OMP_NUM_THREADS=1

# Use the environment variable SLURM_CPUS_PER_TASK to have multiprocessing
# spawn exactly as many processes as you have CPUs available.
srun python3 ./python_multiprocessing.py $SLURM_CPUS_PER_TASK

Python mpi4py

#!/bin/bash -l
#
# Python MPI4PY example job script for MPCDF Cobra.
# Plain MPI. May use more than one node.
#
#SBATCH -o ./out.%j
#SBATCH -e ./err.%j
#SBATCH -D ./
#SBATCH -J MPI4PY
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=40
#SBATCH --time=00:10:00

module purge
module load gcc/10 impi/2019.9
module load anaconda/3/2021.05
module load mpi4py/3.0.3

# Important:
# Set the number of OMP threads *per process* to avoid overloading of the node!
export OMP_NUM_THREADS=1

srun python3 ./python_mpi4py.py