Compilers and languages
Intel Compilers
Intel C/C++ Compiler for Linux
Usage
The name of the C compiler executable is icc
, the name of the C++
compiler executable is icpc
.
Compilation and linking of a C program (source file myprog.c
) is done
as follows:
icc -o myprog myprog.c
To get an overview on the available command line options use the
command
icc --help
.
More information is provided by the manual page
man icc
.
Extensive documentation and e.g. information on code optimization strategies is provided by the official Intel C/C++ Compiler Documentation. For documentation of compatibility, updates and changes of different versions see the Intel Composer release notes
To compile and link MPI codes, use the wrappers mpiicc
and mpiicpc
,
respectively.
Compiling and linking against a more recent C++ standard library
The C++ standard library headers and shared objects installed in the default system folders are relatively dated. This can cause errors such as
<<error: namespace "std" has no member named …>>
GLIBC_2.33 not found
at runtime
when compiling and linking.
The recommended procedure to avoid these errors is the following:
Clean the currently loaded environment modules with
module purge
.Load the compiler module you want to use and all depending modules you need.
Export the environment variables
CC
andCXX
with the compiler you want to use.Lastly, load a recent gcc version, e.g.
module load gcc/<version>
. Do not load any other modules afterwards.Set
LDFLAGS
to point to the recent GCC version of the standard library, e.g.export LDFLAGS="$LDFLAGS -L${GCC_HOME}/lib64 -Wl,-rpath,${GCC_HOME}/lib64"
.Configure and build your application. Be aware that CMake checks
LDFLAGS
at the first invocation only. Make sure to create a new build with CMake, if applicable.
Intel Fortran Compiler for Linux
Usage
The name of the Intel Fortran Compiler executable is ifort
.
Compilation and linking of a Fortran program (source file
myprog.f90
) is done as follows:
ifort -o myprog myprog.f90
To get an overview on the available command line options use the
command ifort --help
.
More information is provided by the manual page man ifort
.
Extensive documentation and e.g. information on code optimization strategies is provided by the official Intel Fortran Compiler Documentation. For documentation of compatibility, updates and changes of different versions see the Intel Composer release notes.
To compile and link MPI codes, use the wrapper mpiifort
.
How to get access to the Intel Compilers
On the Raven and Viper supercomputers, user need to load and specify a version of the Intel compiler explicitly, similarly for Intel MPI. No default versions exist for the Intel compiler and MPI modules, and no default versions are loaded at login.
To get a list of all available Intel compilers, enter module avail intel
.
To get access to a specific Intel compiler, load the module by
module load intel/<version>
.
Intel Compiler for Linux Optimization Flags
Compiler optimization flags
Compiler optimization flags have strong influence on the performance of the executable. Some important flags are given below. First, different optimization levels are available:
-O2
: Standard optimization. Default.-O3
: Aggressive optimization. Use it with care and check the results against a less optimized binary.-O0
: Disables all optimization. Useful for fast compilation and to check if unexpected behavior results from a higher compiler optimization level.-O1
: Very conservative optimization.
In addition,
vectorization
is key to achieving good floating point performance on modern CPUs. For
detailed information on how to specify the instruction set level during
compilation please consult the
Intel Compiler Documentation
In particular, the switches -x
, -ax
, -m
are relevant, for example:
-xCORE-AVX512 -qopt-zmm-usage=high
: Enable AVX512 vectorization for Intel Skylake, CascadeLake, IceLake processors. These flags are recommended on Raven.-xCORE-AVX2
: Enable AVX2 vectorization for Intel Haswell and Broadwell CPUs.-ipo
: Enable interprocedural optimizations beyond individual source files.
The meta switch -fast
is not supported on MPCDF systems because it
forces the static linking of all libraries (i.e. it implies the switch
-static
) which is not possible with certain system libraries.
[*] To obtain information about the features of the Linux host CPU
issue the command cat /proc/cpuinfo | grep flags | head -1
.
Instruction-set-related keywords are, among others, avx, avx2, avx512.
The list of all supported switches and extensive information is covered by the official Intel Compiler Documentation.
Floating point accuracy
Intel compilers tend to adopt increasingly more aggressive defaults for
the optimization of floating-point semantics. The default is
-fp-model fast=1
. We recommend to double check the accuracy of
simulation results by using more conservative settings (which might come
at the expense of computational performance) like
-fp-model precise
(recommended) or even -fp-model strict
.
See the compiler man pages for more details.
GNU Compiler Collection
The GNU Compiler Collection provides – among others – front ends for C, C++, and Fortran.
A default version of GCC comes with the operating system. More recent versions suitable for HPC can be accessed via environment modules.
To compile and link MPI codes using the GNU compilers, use the commands
mpigcc
, mpig++
, or mpigfortran
in combination with Intel MPI.
Find the full documentation at https://gcc.gnu.org/.
GPU Programming
The following packages are provided on the HPC clusters to enable users develop applications for NVIDIA GPUs.
NVIDIA CUDA Toolkit
The NVIDIA CUDA Toolkit provides a
development environment for the programming of NVIDIA GPUs. It includes the CUDA
C++ compiler (nvcc
), optimized libraries, debuggers and profilers, among
others.
Issue module avail cuda
to get an up-to-date list of the CUDA versions
available on a system.
NVIDIA HPC SDK
The NVIDIA HPC SDK provides a C, C++,
and Fortran compiler for the programming of NVIDIA GPUs and multi-core CPUs. It
is the successor product of the PGI compiler suite.
In addition, the NVIDIA HPC SDK comprises a copy of the CUDA toolkit and various
libraries for numerical computation, deep learning and AI, and communication.
Issue module avail nvhpcsdk
to get an up-to-date list of
the versions available on a system.
Kokkos C++ Performance Portability Library
The Kokkos performance portability framework enables the development of applications that achieve consistent good performance accross all relevant modern HPC platforms based on a single-source C++ implementation. It provides abstractions for parallel computation and data management, and supports several backends such as OpenMP and CUDA, among others.
Python
The high-level Python programming language can be extended with modules written in plain C++/CUDA to leverage GPU computing. In this case, the interfaces may be created with Cython or pybind11 in a comparably easy way. Alternatively, the PyCUDA module offers a straight forward way to embed CUDA code into Python modules. Codes that make heavy use of NumPy may compile such costly expressions to CPU or GPU machine code using the Numba package. Note that Numba is non-intrusive as it only uses decorators. It is part of the Anaconda Python distribution.
NAG Fortran compiler
Usually we have the latest NAG Fortran compiler installed. To see all
available NAG compilers on UNIX, enter module avail nagf95
.
To use a specific NAG compiler, load the module by module load nagf95/<$version>
.
The compiler command is nagfor
.
To use the compiler on windows, follow the instructions given in
/afs/ipp-garching.mpg.de/common/soft/nag_f95/<$version>/windows/readme.txt
for versions equal or later rel5.3. For access a valid AFS-Token for the cell ipp-garching.mpg.de is necessary.
More information about the NAG Fortran Compiler can be found in the documentation of NAG at NAG Fortran Compiler Documentation.
Python
At the MPCDF, Python including a plethora of scientific packages for numerical computing and data science (NumPy, SciPy, matplotlib, Cython, Numba, Pandas, etc.) used to be provided in a up-to-date fashion via the Anaconda Python Distribution.
Starting in 2024, a new Python basis is deployed, based on free software sources. Run the commands
module avail python-waterboa
module help python-waterboa
to get information on what’s available, where the versioning is similar to the one of Anaconda.
A list of the installed legacy Anaconda releases can be obtained via the following command:
module avail anaconda
Please note that new versions of Anaconda Python cannot be provided any more due to licensing restrictions.
Python for HPC
Being an interpreted and dynamically-typed language, plain Python is not a language suitable per-se to achieve high performance. Nevertheless, with the appropriate packages, tools, and techniques the Python programming language can be used to perform numerical computation in a very efficient manner, covering both aspects, the program’s efficiency and the programmer’s efficiency. The aim of this article is to provide some advice and orientation to the reader in order to use Python correctly on the HPC systems and to take first steps towards basic Python code optimization.
Performance
The key to achieve good performance with Python is to move notably expensive computation from the interpreted code layer down to a compiled layer which may consist of compiled libraries, code written and compiled by the user, or just-in-time compiled code. Below, three packages are discussed for these use cases.
NumPy
NumPy is the Python module that provides arrays of native datatypes (float32, float64, int64, etc.) and mathematical operations and functions on them. Typically, mathematical equations (in particular, vector and matrix arithmetic) can be written with NumPy expressions in a very readable and elegant way, which brings several advantages: NumPy expressions avoid explicit, slow loops in Python. In addition, NumPy uses compiled code and optimized mathematical libraries internally, e.g. Intel MKL on MPCDF systems, which enables vectorization and other optimizations. Parts of these libraries use thread-parallelization in a very efficient way by default, e.g. to perform matrix multiplications. In summary, NumPy provides the de-facto standard for numerical array-based computations and serves as the basis for a multitude of additional packages.
Cython
Cython is a Python language extension that makes it relatively easy to create compiled Python modules written in Cython, C or C++. It integrates well with NumPy arrays and can be used to implement time-critical parts of an algorithm. Moreover, Cython is very useful to create interfaces to C or C++ code, such as legacy libraries or native CUDA code. Technically, the Cython source code is translated by the Cython compiler to intermediate C code which is then compiled to machine code by a regular C compiler like GCC or ICC.
Numba
Numba is a just-in-time compiler based on the LLVM framework. It compiles Python functions at runtime for the datatypes these functions are being called with. Moreover, Numba implements a subset of NumPy’s functions, i.e. it is able to compile NumPy expressions. Functions are declared via a simple decorator-syntax to be suitable for jit-compilation, hence, Numba is only little intrusive on existing code bases.
Parallelization
While Python does implement threads as part of the standard library, these cannot be used to accelerate computation on more than one core in parallel due to cPython’s global interpreter lock. Nevertheless, Python is suitable for parallel computation. In the following, two important packages for intra-node and inter-node parallelism are addressed.
multiprocessing
The multiprocessing package is part of the Python standard library. It implements building blocks such as pools of workers and communication queues that can be used to parallelize data-parallel workloads. Technically, multiprocessing forks subprocesses from the main Python process that can run in parallel on multiple cores of a shared-memory machine. Note that some overhead is associated with the inter-process communication. It is, however, possible to access shared memory from several processes simultaneously. A typical use case would be large NumPy arrays.
mpi4py
Access to the Message Passing Interface (MPI) is available via the module mpi4py. It enables parallel computation on distributed-memory computers where the processes communicate via messages with each other. In particular, the mpi4py package supports the communication of NumPy arrays without additional overhead. On MPCDF systems, the environment module mpi4py provides an optimized build based on the default Intel MPI library.
IO
NumPy implements efficient binary IO for array data that is useful, e.g., for temporary files. A better choice with respect to portability and long-term compatibility are HDF5 files. HDF5 is accessible via the h5py Python package and offers an easy-to-use dictionary-style interface. For parallel codes, a special build of h5py with support for MPI-parallel IO is provided via the environment module h5py-mpi.
The Python software ecosystem
In addition to the packages discussed up to now, there is a plethora of solid and well-proven packages for scientific computation and data science available, covering, e.g., numerical libraries (SciPy), visualization (matplotlib, seaborn), data analysis (pandas), and machine learning (TensorFlow, pytorch), to name only a few.
Software installation
Often, users need to install special Python packages for their
scientific domain. In most cases, the easiest and quickest way is to
create an installation local to the user’s homedirectory. After loading
the Anaconda environment module, the command
pip install --user PACKAGE_NAME
would download and install a package
from the Python package index (PyPI), or similarly, the command
python setup.py install --user
would install a package from an
unpacked source tarball. In both cases, the resulting installation is
located below “~/.local” where Python will find it by default.
Summary
The software recommended in this article is available via the Anaconda Python Distribution (environment module “anaconda/3”) on MPCDF systems. Note that for some packages (mpi4py, h5py-mpi), the hierarchical environment modules matter, i.e., it is necessary to load a compiler (gcc, intel) and an MPI module (impi) in addition to Anaconda in order to get access to these depending environment modules.
The application group at the MPCDF has developed an in-depth course on “Python for HPC” which covers all the topics touched in this article in more detail on two days. It is taught one to two times per year and announced via the MPCDF web page.
Finally, it should be pointed out that Python 2 reaches its official end-of-life on January 1, 2020. Consequently, new Python modules and updates to existing ones will not take Python 2 compatibility into account in the future. Users still running legacy code are strongly encouraged to migrate to Python 3.