HPC Software and Applications

General Questions

How can I install my own software?

You can install software in your home directory, where you have write permissions. Root privileges are not required for most software installations.

When installing software, be sure to specify an installation directory within your home directory. Here are some examples for common build systems:

GNU Build System:

./configure --prefix=$HOME/soft/my_package
make && make install

CMake:

cmake -DCMAKE_INSTALL_PREFIX:PATH=$HOME/soft/my_package .
make && make install

Python:

To install a Python package to your local user directory (~/.local):

# Using pip
pip install --user package_name

# Using a setup.py file
python setup.py install --user

For detailed instructions, please refer to the documentation provided with the software. If you encounter any issues, you can request support from our helpdesk.

Environment modules

The MPCDF uses modules to adapt the user environment for working with software installed at various locations in the file system or for switching between different software versions. The user is no longer required to explicitly specify paths for different executable versions, or keep track of PATH, MANPATH and related environment variables. With the modules approach, users simply ‘load’ and ‘unload’ modules to control their environment. Note that since 2018, HPC systems as well as the increasing number of dedicated clusters all use hierarchical environment modules.

How do I use environment modules interactively?

Here are the most common module commands. For a complete reference, see the official documentation.

  • module help: Lists all module subcommands.

  • module avail: Lists all available software packages.

  • module help <package>/<version>: Shows help for a specific package.

  • module load <package>/<version>: Loads a package into your environment.

  • module unload <package>/<version>: Removes a package from your environment.

  • module list: Lists all currently loaded packages.

How do I use environment modules in scripts?

Environment modules are not loaded by default in non-interactive shells (e.g., shell scripts or batch scripts). To use them, you must first source the appropriate profile script:

  • For bash or sh shells:

    source /etc/profile.d/modules.sh
    
  • For csh or tcsh shells:

    source /etc/profile.d/modules.csh
    

How can I avoid using absolute paths in my scripts?

When you load a module, it sets environment variables that you can use in your scripts and makefiles. For example, most modules set a _HOME variable (e.g., MKL_HOME) that points to the package’s installation directory.

To see all environment variables set by a module, use module show <package>/<version>. For more information, use module help <package>/<version>.

Examples

Interactive Session

$ module load intel mkl
$ ifort -I$MKL_HOME/include example.F -L$MKL_HOME/lib/intel64 -lmkl_intel_lp64 -lmkl_sequential -lmkl_core

Makefile

FC = ifort

example: example.F
	$(FC) -I$(MKL_HOME)/include test.F -L$(MKL_HOME)/lib/intel64 -lmkl_intel_lp64 -lmkl_sequential -lmkl_core

Paging through module avail output

To view the output of module avail one page at a time, you can pipe it to less:

bash/sh:

module avail 2>&1 | less

csh/tcsh:

( module avail ) |& less

How do hierarchical environment modules work?

To manage the large number of software packages and their dependencies, we use a hierarchical module system. Compilers (e.g., gcc, intel) are at the top level, followed by MPI libraries, and then other libraries.

This means that you must load a compiler module before you can see the modules that depend on it. Similarly, you must load an MPI module to see the modules that depend on that MPI implementation.

To reset your environment and return to the root of the module hierarchy, use module purge.

For example, to load the FFTW library compiled with the Intel compiler and Intel MPI, you would load the modules in order:

module load intel
module load impi
module load fftw-mpi

After loading intel, module avail will show the available MPI libraries. After loading impi, you will see the available FFTW libraries.

You can also load all required modules in a single command, as long as you maintain the correct hierarchical order:

module load intel impi fftw-mpi

How do I quickly find a module?

If you know the name of a module but are unsure of its version or dependencies, you can use the find-module command:

find-module <module_name>

The output will show you the available versions and any required dependencies. You can then load the module and its dependencies in the correct order.

Example:

$ find-module horovod
horovod/cpu/0.13.11 (after loading anaconda/3/2019.03 tensorflow/cpu/1.14.0)

$ module load anaconda/3/2019.03 tensorflow/cpu/1.14.0 horovod/cpu/0.13.11

Note that many applications and tools (e.g., git, cmake, matlab) are not part of the hierarchical module system and are available at the top level.

How can I disable the “MPCDF specific note” for module avail?

The module avail command displays a note about our hierarchical module system. To disable this note, set the following environment variable in your ~/.bashrc file:

export MPCDF_DISABLE_MODULE_AVAIL_HINT=1

Why are there no BLAS/LAPACK modules?

We provide Intel’s Math Kernel Library (MKL), which includes highly optimized versions of BLAS, LAPACK, and other linear algebra libraries. For more information, please see our MKL guide.

Compiled Languages

CMake

Which CMake version should I use?

We recommend using the newest available version of CMake. CMake is backward-compatible, so you can use a newer version with older CMakeLists.txt files (version 3.0 and later). Newer versions of CMake provide better support for the latest compilers and libraries.

What does the “Policy CMPXXXX is not set” warning mean?

This warning indicates that the behavior of a CMake feature has changed between the version specified in your CMakeLists.txt file and the version you are currently using.

  • As a user: You can generally ignore this warning. CMake will use the behavior defined in the CMakeLists.txt file.

  • As a developer: You should review the policy change and update your CMakeLists.txt file to explicitly set the desired behavior. For more information, see the CMake policies documentation.

What if CMake cannot find a library?

If CMake cannot find a library, you can help it by setting the _ROOT environment variable (e.g., BOOST_ROOT) to the library’s installation directory.

We strive to set these variables automatically when you load a module, but not all modules currently do so. If you find a module that is missing a _ROOT variable, please let us know.

C/C++ and Fortran

Which compilers are supported?

We support the Intel and GNU compilers for C/C++, and Fortran. We provide MPI bindings and a wide range of libraries for these compilers through our hierarchical module system.

Other compilers, such as Clang, may be available through the module system but are not officially supported.

How do I ensure my executable finds shared libraries at runtime?

If your application depends on shared libraries (.so files) that are not in a standard system directory, you may encounter an error like cannot open shared object file: No such file or directory.

To resolve this, you can either set the rpath when you compile your application (recommended) or set the LD_LIBRARY_PATH environment variable at runtime.

Setting the LD_LIBRARY_PATH

Alternatively, you can set the LD_LIBRARY_PATH environment variable at runtime:

export LD_LIBRARY_PATH=/path/to/library:$LD_LIBRARY_PATH

However, we do not recommend this approach, as it can create dependencies on a specific environment and may cause your application to fail if the variable is not set correctly.

Why do I get C++ standard library errors with the Intel compiler?

The Intel C++ compiler supports modern C++ standards but relies on the system’s C++ standard library (libstdc++), which may not be up-to-date.

To resolve this, load a recent GCC module (e.g., module load gcc/13) after loading the Intel compiler and any other library modules. This will provide a modern libstdc++ while ensuring that other libraries are linked against the Intel compiler.

For more information, see our compilers documentation.

Debugging C/C++ and Fortran Codes

How can I use the Address Sanitizer (ASAN) with CUDA?

ASAN is a memory error detector for C/C++ codes, available in the GCC and Clang compilers (via the -fsanitize=address flag).

When using ASAN with CUDA code, you may encounter a cudaErrorNoDevice error due to an incompatibility with the NVIDIA driver. To work around this, set the following environment variable before running your application:

export ASAN_OPTIONS="protect_shadow_gap=0"

How do I debug a GPU memory error on MI300A (Viper-GPU)?

To debug a GPU memory error on Viper-GPU, follow these steps:

  1. Enable XNACK:

    export HSA_XNACK=1
    
  2. Build your HIP code with debug symbols:

    hipcc -g -ggdb -O0 ...
    
  3. Launch your application with rocgdb:

    rocgdb --args <executable> <arguments>
    
  4. Configure rocgdb:

    set pagination off
    set amdgpu precise-memory on
    b abort
    set non-stop on
    
  5. Set a breakpoint at your kernel. If the kernel name is mangled, you can demangle it with c++filt.

    b <demangled_kernel_name>
    
  6. Run the application and inspect the threads.

    r
    info threads
    thread <tid>
    set scheduler-locking step
    

This will allow you to step through the kernel code and inspect variables. To turn off scheduler locking, use set scheduler-locking off.

Interpreted Languages

Python

Update 2024: Due to license changes, we can no longer provide recent versions of Anaconda Python. For more details, please see our article in Bits and Bytes issue 216.

We provide Python software stacks for scientific computing that include optimized packages like NumPy, SciPy, and Numba. You can see the available Python modules with module avail python-waterboa and module available anaconda (legacy).

A basic system Python is also available for simple scripting tasks.

How do I install Python packages?

If a package is not available in our default installations, you can install it in your home directory using pip.

First, load a Python module:

module load python-waterboa

Then, install the package with the --user flag:

pip install --user <package_name>

The --user flag tells pip to install the package in your local user directory (~/.local/). Without this flag, the installation will fail due to a lack of write permissions in the system directories.

For managing multiple projects, we strongly recommend using virtual environments.

How do I use Conda environments?

Update 2024: Due to licensing restrictions, you may only use free channels like conda-forge and bioconda. The main, anaconda, r, and msys2 channels from repo.anaconda.com are not permitted.

To enforce this, add the following to your environment.yml file:

channels:
  - conda-forge
  - nodefaults

If you have a local .condarc file, you must update it to include these lines:

channels:
  - conda-forge
  - nodefaults
channel_priority: strict
custom_channels:
  main: null
  r: null
  anaconda: null
  msys2: null

Disclaimer: We do not provide support for user-created Conda environments. If your required packages are available via pip, we recommend using a pip-based virtual environment instead.

For a more robust solution, consider using our open-source tool, Condainer, which creates portable, compressed Conda environments.

Important: Do not use conda init. This command modifies your .bashrc file and can interfere with our module system. If you have already run conda init, you will need to manually clean up your .bashrc and .condarc files.

To use conda in a non-intrusive way, use eval instead:

module purge
module load python-waterboa/2024.06
eval "$(conda shell.bash hook)"

conda create -n my_env python=3.11
conda activate my_env

Be aware that Conda packages are not optimized for our systems and may have performance or compatibility issues. We recommend using our provided modules whenever possible.

How can I write parallel Python code?

There are several ways to parallelize your Python code on our systems. The best method depends on your specific needs.

Implicit Threading (NumPy/SciPy)

Many NumPy and SciPy functions are backed by Intel’s Math Kernel Library (MKL), which is automatically parallelized. If your code relies heavily on linear algebra operations, it may already be taking advantage of multiple cores.

You can control the number of threads used by MKL with the MKL_NUM_THREADS environment variable. To test the performance benefits, you can compare a run with export MKL_NUM_THREADS=1 to a run where the variable is unset (which will default to using all available cores).

multiprocessing

For tasks that are not automatically parallelized, you can use Python’s multiprocessing package to manually distribute work across multiple processes on a single node.

The Pool class is a convenient way to apply a function to a sequence of inputs in parallel:

from multiprocessing import Pool

def f(x):
    return x * x

if __name__ == '__main__':
    with Pool(10) as p:
        print(p.map(f, range(100)))

This example will distribute the work across 10 processes. Note that multiprocessing is limited to a single node.

mpi4py

For distributed-memory parallelism across multiple nodes, you can use mpi4py, the Python interface to MPI.

To use mpi4py, load the mpi4py module after loading a Python module:

module load python-waterboa
module load mpi4py

In your Slurm script, you can launch an mpi4py application with srun:

srun python my_application.py

srun will automatically handle the distribution of processes according to the resources you have requested in your Slurm script. For more information, see the mpi4py documentation.

R

We provide R through the R environment module. Our R installations are built from source and linked against Intel MKL for improved performance.

How do I install R packages?

You can install packages from the CRAN repository directly from the R prompt.

To install a package, use the install.packages() function:

install.packages("my_package")

When prompted, choose to install the package to a local directory. This will install the package in your home directory.

If you encounter an error that a package is not available for your version of R, try loading a newer R module.

Julia

We provide Julia through the julia environment module.

How do I install Julia packages?

You can install packages from the Julia package registry using the built-in package manager.

From the Julia prompt, use the Pkg manager to add a package:

using Pkg
Pkg.add("my_package")

Jupyter Notebooks

The Jupyter Notebook is an open-source web application for creating and sharing documents with live code, equations, and visualizations. For more information, see the Jupyter website.

How do I launch a Jupyter Notebook on an HPC system?

You can launch Jupyter Notebooks on our HPC systems through our remote visualization service at https://rvs.mpcdf.mpg.de/.

MATLAB

We provide recent versions of MATLAB through the matlab environment module. You can see the available versions with module avail matlab.

Starting with version R2024aU2, our MATLAB installations are containerized, but this does not affect how you use the software.

How do I run the MATLAB GUI?

There are several ways to run the MATLAB GUI on our systems.

VNC

You can run the MATLAB GUI in a VNC session on a login node. For instructions, see our VNC documentation.

X11 Forwarding

You can also use X11 forwarding, but this is the least efficient method.

  1. Connect to a gateway machine with X11 forwarding enabled:

    ssh -C -Y YOUR_USERNAME@gate1.mpcdf.mpg.de
    
  2. From the gateway, connect to a login node:

    ssh -C -Y raven.mpcdf.mpg.de
    
  3. Load the MATLAB module and launch the GUI:

    module load matlab
    matlab &
    

Note: macOS and Windows users will need to install an X server (e.g., XQuartz or Xming).

How do I run MATLAB code in a batch job?

To run a MATLAB script in a batch job, you must run it in non-graphical mode.

Here is an example Slurm script for a sequential MATLAB job:

#!/bin/bash -l
#SBATCH -J MATLAB
#SBATCH -o ./job.out.%j
#SBATCH --ntasks=1
#SBATCH --mem=2000MB
#SBATCH --time=00:10:00

module purge
module load matlab
srun matlab -singleCompThread -nodisplay -r "run('my_program.m')"

How do I run parallel MATLAB code?

You can run parallel MATLAB code on up to a full compute node. You must tell MATLAB how many cores to use.

Here is an example of a parfor loop that uses the number of cores requested from Slurm:

% my_parallel_program.m
ncpus = str2num(getenv('SLURM_CPUS_PER_TASK'));
parpool('local', ncpus);

n = 200;
A = 500;
a = zeros(1, n);

parfor i = 1:n
    a(i) = max(abs(eig(rand(A))));
end

disp(a(n));
disp('OK!');
exit

And the corresponding Slurm script:

#!/bin/bash -l
#SBATCH -J MATLAB_parallel
#SBATCH -o ./job.out.%j
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=8
#SBATCH --mem=16000MB
#SBATCH --time=01:00:00

module purge
module load matlab
srun matlab -nodisplay -r "run('my_parallel_program.m')"

Please request only the resources your code can effectively use. For advanced use cases, you may need to use MATLAB’s Cluster Profile Manager.

Message Passing Interface (MPI)

Which MPI implementations are supported?

We support the Intel MPI library and OpenMPI.

What if CMake cannot find MPI?

If CMake has trouble finding your MPI installation (a common issue with Intel MPI), you can explicitly specify the MPI compiler wrappers.

For Intel compilers:

module load intel/...
module load impi/2021.x
...
cmake -DMPI_C_COMPILER=mpiicx \
      -DMPI_CXX_COMPILER=mpiicpx \
      -DMPI_Fortran_COMPILER=mpiifx \
      ...

For GNU compilers:

module load gcc/...
module load impi/2021.x
...
cmake -DMPI_C_COMPILER=mpicc \
      -DMPI_CXX_COMPILER=mpicxx \
      -DMPI_Fortran_COMPILER=mpifort \
      ...

Why can’t I use mpirun to launch my MPI code?

On our Slurm-based clusters, you must use srun to launch MPI applications.

For production jobs, you should always submit a batch script. For small, interactive tests, you can use srun on a login node:

srun --time=00:05:00 --mem=1G --ntasks=2 ./my_mpi_application

Visualization

How do I create a movie from a sequence of images?

You can use ffmpeg to create a movie from a sequence of images (e.g., input_0001.png, input_0002.png, etc.).

First, load the ffmpeg module:

module load ffmpeg

Then, run ffmpeg with your input files and desired options. This example creates a 30fps MP4 video:

ffmpeg -start_number 1 -i input_%04d.png -c:v libx264 -vf "fps=30,format=yuv420p" output.mp4

For more information on the available options, see the ffmpeg documentation.

How do I install additional TeX/LaTeX packages?

We provide comprehensive LaTeX environments through the texlive module. If you need a package that is not included, you can install it locally using the TeX Live Manager (tlmgr).

  1. Load the texlive module:

    module load texlive/2021
    
  2. Initialize a local user tree (you only need to do this once):

    tlmgr init-usertree
    
  3. Set the repository to match your texlive version:

    tlmgr --usermode option repository https://ftp.tu-chemnitz.de/pub/tug/historic/systems/texlive/2021/tlnet-final/
    
  4. Install the package:

    tlmgr --usermode install <package_name>
    

Be sure to use a repository that matches the version of texlive you have loaded.

GUI Applications

Why am I having trouble with VSCode on older HPC clusters?

As of version 1.86, VSCode requires a newer version of glibc than is available on some of our older systems (e.g., Cobra).

As a workaround, you can use VSCode version 1.85.2 and disable automatic updates.

Why are some GUI applications not working on the login nodes?

Some GUI applications (e.g., Firefox, Spyder, VSCode) use sandboxing features that are not compatible with our security policies.

As a workaround, you can disable sandboxing by loading the nosandbox module before launching the application:

module load nosandbox/1.0
spyder &

Alternatively, you can set the appropriate environment variables yourself:

  • For Qt-based applications (e.g., Spyder):

    export QTWEBENGINE_DISABLE_SANDBOX=1
    
  • For Electron-based applications (e.g., VSCode): Use the --no-sandbox flag.

  • For Firefox:

    export MOZ_DISABLE_CONTENT_SANDBOX=1
    export MOZ_DISABLE_GMP_SANDBOX=1
    ...
    

For better performance, we recommend running heavyweight GUI applications on your local machine and accessing remote files via sshfs or VSCode’s Remote-SSH extension.