HPC Systems and Services

Raven

Viper

Slurm Batch System

How do I submit a job to Slurm?

To submit a job, you first need to create a submission script (e.g., my_job.sh) that specifies the resources your job requires and the commands to be executed. You can find example scripts in the documentation for each HPC system.

Once you have a script, submit it with sbatch:

sbatch my_job.sh

sbatch will return a job ID that you can use to track your job.

Can I submit jobs that run longer than 24 hours?

No, jobs on our HPC systems are limited to a 24-hour runtime. This policy ensures fair access and high system utilization.

If your application needs to run for longer, it must support checkpointing. This allows your application to save its state and be restarted in a subsequent job.

How do I launch an MPI application?

To launch an MPI application, use srun in your Slurm submission script:

srun my_application

srun will automatically distribute the processes according to the resources you have requested.

What is the correct order of commands in a Slurm script?

All #SBATCH directives must come before any executable commands in your script. Any #SBATCH directives that appear after the first command will be ignored.

#!/bin/bash -l

# SBATCH directives
#SBATCH ...
#SBATCH ...

# Your commands
module load ...
srun ./my_executable ...

Can I run an interactive job for debugging?

Yes, you can run short, interactive jobs on the login nodes for debugging and development.

For example, to request an interactive session with 2 tasks for 5 minutes, you can use:

srun --time=00:05:00 --mem=1G --ntasks=2 --pty /bin/bash

Once the resources are allocated, you will get a shell on a compute node where you can run your commands.

How can I check the estimated start time of my job?

Use squeue with the --start flag:

squeue --start -j <jobid>

How do I get detailed information about a running job?

Use scontrol show job:

scontrol show job -dd <jobid>

How do I get information about a finished job?

Use the sacct command to view information about your past jobs.

To see information about a specific job:

sacct -j <jobid>

To see information about all of your recent jobs with custom formatting:

sacct -u $USER --format=JobID,JobName,MaxRSS,Elapsed

What happens if a hardware failure occurs during my job?

In the rare event of a hardware failure, Slurm will interrupt your job and you will see an error message like srun: error: Node failure on....

By default, your job will be automatically resubmitted to the queue and will run on a different set of nodes. If you do not want your job to be automatically requeued, you can use the --no-requeue flag with sbatch.

How do I handle CPU pinning?

CPU pinning is handled automatically by Slurm. To ensure correct pinning, please use srun to launch your application and refer to our example job scripts for MPI, OpenMP, and hybrid jobs.

Parallel File Systems (GPFS)

Which file systems are available and how should I use them?

Each of our HPC systems has two main file systems:

  • /u/$USER (home directory): This file system is intended for source code, software installations, and smaller data files. Do not run I/O-intensive applications from your home directory.

  • /ptmp/$USER (temporary storage): This file system is optimized for large, streaming I/O, such as checkpoints and simulation output. I/O-intensive applications must use this file system. Please note that files on /ptmp are subject to a 12-week cleanup policy.

For more details on backups, quotas, and cleanup policies, please see the documentation for the specific HPC system you are using. Remember that the file systems are a shared resource; improper use can affect all users.

How can I improve my I/O performance?

For best performance, use large, sequential I/O operations. Avoid random access patterns and creating a large number of small files, especially in a single directory.

As the file systems are a shared resource, performance will vary depending on the overall system load.

How do I share files with other users?

You can use Access Control Lists (ACLs) to share files and directories with other users, even if they are not in your group. The primary tools for this are getfacl and setfacl.

Granting Read Access

To grant read access to a directory to user jane:

setfacl -R -m user:jane:rx /u/my/directory
setfacl -m user:jane:rx /u/my

The -R flag recursively applies the permissions. The x (execute) permission is required to traverse directories.

To grant access to a group (e.g., the support group):

setfacl -R -m group:support:rx /u/my/directory
setfacl -m group:support:rx /u/my

Revoking Access

To revoke access:

setfacl -R -x user:jane /u/my/directory

Viewing ACLs

To view the ACLs for a directory:

getfacl /u/my/directory

Removing All ACLs

To remove all ACLs from a directory and revert to standard Unix permissions:

setfacl -b /u/my/directory

How do I transfer files to and from the HPC systems?

There are several ways to transfer files:

  • MPCDF DataShare: Use the ds command-line client (ds put, ds get) for small to medium-sized files.

  • scp and rsync: Standard tools for transferring files and directories. rsync can resume interrupted transfers.

  • curl and wget: For downloading files from the web.

  • bbcp: For high-performance, parallel data transfers.

  • Globus: For large-scale, reliable data transfers.

Performance Monitoring

How can I check the performance of my jobs?

We monitor all jobs and provide performance data as downloadable PDF reports at https://hpc-reports.mpcdf.mpg.de/. These reports can help you identify performance issues that may require further investigation with a profiler.

How do I disable the performance monitoring for my job?

Our performance monitoring system can sometimes interfere with other profiling tools like VTUNE or likwid. To temporarily suspend it for a single job, use the hpcmd_suspend wrapper:

srun hpcmd_suspend ./my_executable

The monitoring will be automatically re-enabled when your job finishes. Please do not suspend the monitoring unless you are performing your own measurements.

GPU Computing

How do I use the NVIDIA Multi-Process Service (MPS)?

NVIDIA MPS allows multiple MPI processes to share a single GPU. This can be useful if you are running more MPI ranks on a node than there are GPUs.

To enable MPS for your job on Raven, add the --nvmps flag to your sbatch command.

How do I profile my GPU code?

We provide the NVIDIA Nsight tools for GPU profiling.

  • Nsight Systems: Use nsys to generate a timeline of your application and identify bottlenecks.

    module load cuda/XX.Y nsight_systems/ZZZZ
    nsys profile -t cuda,cudnn srun ./my_application
    

    You can view the resulting .nsys-rep file with the nsys-ui GUI.

  • Nsight Compute: Use ncu to perform an in-depth analysis of specific kernels.

Are there dedicated resources for interactive GPU development?

Yes, the gpudev partition on Raven is available for short, interactive GPU jobs. To use it, add the following to your Slurm script:

#SBATCH --partition=gpudev
#SBATCH --gres=gpu:a100:1

The time limit for this partition is 15 minutes. You can also request an interactive session:

srun --time=00:10:00 --partition=gpudev --gres=gpu:a100:1 --pty /bin/bash

Containers

Can I run Docker containers?

For security reasons, Docker is not directly supported on our HPC systems. However, you can convert Docker containers to Singularity or Charliecloud, which are supported.

For more information, see our container documentation.

Remote Visualization

How do I run GUI applications that use OpenGL?

Applications that use OpenGL for hardware-accelerated rendering (e.g., VisIt, ParaView) must be run through our remote visualization service.

After launching a remote visualization session, open a terminal and prefix your command with vglrun:

module load visit
vglrun visit

How do I access the remote visualization service?

You can access the remote visualization service through your web browser at https://rvs.mpcdf.mpg.de/. No special client software is required.