No.214, December 2023
Contents
HPC Software News
CUDA-aware OpenMPI on Raven
MPCDF provides CUDA-aware OpenMPI on Raven based on different compilers and CUDA
versions. The complete list can be inspected by running find-module openmpi_gpu
.
Below, we highlight some relevant combinations of compiler and CUDA
modules that can be used with the openmpi_gpu/4.1
module.
GCC-based CUDA-aware OpenMPI builds are available after loading gcc/11 cuda/11.6
or gcc/12 cuda/12.1
. Recently, a CUDA-aware OpenMPI module has been
added which works with the CUDA version and the compilers
provided by the Nvidia SDK. To access it, the modules nvhpcsdk/23 cuda/11.8-nvhpcsdk
must be loaded.
Tobias Melson, Tilman Dannert
GPU-accelerated VASP
With the deployment of CUDA-aware OpenMPI for Nvidia compilers (nvhpcsdk, see above) MPCDF provides GPU-accelerated builds of the VASP software package for atomic-scale materials modelling from first principles. Currently, a vasp-gpu/6.4.2
module is available on Raven and selected institute clusters.
Note that MPCDF does not hold a license for VASP. Individual users have to bring in their own license (via MPCDF helpdesk) in order to be enabled for using VASP at MPCDF.
Markus Rampp
Intel oneAPI: transition from ifort to ifx
The transition to the new LLVM-based compilers in the Intel oneAPI package is
progressing. Already in the currently installed module intel/2023.1.0.x
,
icx
and icpx
are the default compilers for C and C++, respectively,
replacing the “classic” compilers icc
and icpc
. As the next step, MPCDF
will follow Intel’s recommendation to set ifx
together with its MPI wrapper
mpiifx
as the default Fortran compiler in the upcoming intel module
corresponding to the oneAPI release 2024.0. The “classic” Fortran compiler ifort
will still be present for some time, but should be considered deprecated, because its
development had effectively been frozen some time ago.
Users are advised to adjust all Fortran builds to use the new ifx
compiler.
A porting guide
exists with detailed information on this transition. Further support is
provided at the MPCDF Helpdesk.
Tobias Melson, Markus Rampp
Compressed Portable Conda Environments for HPC Systems
Introduction and Motivation
The Conda package manager and the related workflows have become an accepted
standard when it comes to distributing scientific software for easy installation
by end users. Using conda
, complex software environments can be defined by
means of simple descriptive environment.yml
files. On MPCDF systems, users
may use Conda environments, but without support from MPCDF for the software therein.
Once installed, large Conda environments can easily amount to several 100k individual (small) files. On the local file systems of a laptop or PC this is typically not an issue. However, in particular on the large shared parallel file systems of HPC systems the vast amount of small files may cause issues, as these file systems are optimized for other scenarios. Inode exhaustion and heavy load due to (millions of) file opens, short reads, and closes happening during the startup phase of Python jobs from the different users on the system are only two examples.
Move Conda environments into compressed image files
MPCDF developed the new open-source tool Condainer, which adresses these issues by moving Conda
environments into compressed squashfs images, reducing the number of files
stored directly on the host file system by orders of magnitude. Condainer
images are standalone and portable: They can be copied between different
systems, improving reproducibility and reusability of proven-to-work software
environments. In particular, they sidestep the integration of a specific
conda
executable into the user’s .bashrc
file, which often causes issues and
is orthogonal to the module-based software environments provided on HPC systems.
Technically, Condainer uses a Python basis from Miniforge (which is a free
alternative to Miniconda) and then installs the user-defined software
stack from the usual environment.yml
file. Package dependency resolution and
installation are extremely fast thanks to the mamba
package manager (an
optimized replacement for conda
). As a second step, Condainer creates a
compressed squashfs image file from the staging installation, before it deletes
the latter to save disk space. Subsequently, the compressed image is mounted (using
squashfuse
) at the very same directory, providing the full Conda environment
to the user who can activate
or deactivate
it, just as usual. Moreover,
Condainer provides functionality to run executables from the Conda environment
directly and transparently, without the need to explicitly mount and unmount the
image.
Please note that the squashfs images used by Condainer are not “containers” in the strict terminology of Docker, Apptainer, or alike. With Condainer, there is no process isolation or similar, rather Condainer is an easy-to-use and highly efficient wrapper around the building, compressing, mounting, and unmounting of Conda environments on top of compressed image files.
Basic usage examples
Build a compressed environment
Follow along the following commands once in order to build a compressed image of a Conda environment that is defined in ‘environment.yml’:
# on MPCDF systems, e.g. Raven:
module load condainer
# create specific project directory:
mkdir my_cnd_env && cd my_cnd_env
# initialize project directory with a skeleton:
cnd init
ls
# edit the 'environment.yml' example file,
# or copy your own file here
# build the environment and compressed image:
cnd build
ls
Activate a compressed environment
After building, you can activate the environment for your current shell session, similar to plain Conda or a Python virtual environment:
source activate
Please note that source activate
will only work with bourne shells (e.g.
bash
or zsh
), not with the older C shells and korn shells.
Alternatively, run an executable from a compressed environment directly
In case you do not want to activate the environment, you can run individual executables from the environment directly, e.g.
cnd exec -- python3
The cnd
command supports the flag --directory
to specify a certain Condainer
project directory, allowing for arbitrary current working directories.
Limitations
As the squashfs fuse mounts are specific to an individual compute node, Condainer currently (v0.1.8) does not support multi-node batch jobs.
Availability
The software including its documentation is freely available via the
MPCDF gitlab. Moreover, it is provided via
the environment module condainer
on the Raven HPC system, and will be
offered on more systems in the near future.
Klaus Reuter
New Features in the HPC-Cloud
After commissioning the initial set of compute and storage resources in 2021, deploying the GPU- and NVMe-focused extension earlier this year, and rolling-out integrated object storage, MPCDF has recently deployed several new features to better support the diverse technical requirements of current and future projects:
SSD-based block volumes
There is now an SSD-based block volume type CephSSD representing a middle-ground option between highly-performant local SSDs and the highly-flexible and scalable network-based HDD storage associated with the default volume type. While I/O performance cannot be guaranteed in a shared-resource environment, one can expect a roughly 2X speedup in terms of small I/O operations per second and large transfer bandwidth, as well as a significant reduction in latency.
To evaluate whether the new volume type is a good option for your project, please contact the cloud enabling team via the helpdesk. As a tip, existing block volumes can be migrated between types online, making it relatively simple to test an already-deployed application.
In addition to the new volume type, all Ceph-based system disks, i.e. the OS root of the VMs not hosted on local SSDs, have been transparently migrated to an SSD pool “for free”, so that routine tasks such as software installation and updates complete more quickly.
Automated domain name service
Hostnames are now automatically generated for most devices attached to the public or local cloud networks, including virtual machines and floating IP addresses. The system works like this:
Each virtual machine is assigned a hostname of the following form:
VM_NAME.PROJECT_NAME.hpccloud.mpg.de
If the name of the virtual machine is invalid according to the requirements of DNS, then a unique hostname based on the fixed IP address will be substituted automatically.
Each floating IP is assigned a hostname of the following form:
FIP_DESCRIPTION.PROJECT_NAME.hpccloud.mpg.de
If the description field is empty or invalid, then a unique hostname based on the floating IP address will be substituted automatically.
Hostnames are synchronized with the MPCDF DNS servers every five minutes. For devices on the public cloud network, both forward and reverse entries are propagated to the global DNS, whereas on local networks only the forward (i.e. hostname->IP address) entries are published.
Thus, within the framework described above it is possible to deploy and configure many applications on the HPC-Cloud without tracking individual IP addresses.
The Robin cluster
The Remote Visualization Service at MPCDF has recently been expanded with a new cluster called Robin. Robin is the first compute cluster of MPCDF in the HPC-Cloud and its resources are available to all users with an access to the HPC systems (i.e. Cobra and Raven).
One of the main advantages of the Robin cluster is its flexibility: new nodes can be easily deployed in the HPC-Cloud, automatically configured and added to the Slurm cluster, allowing for a convenient scaling of the compute resources on Robin depending on the current demand.
Robin uses Slurm as a job scheduler and it can currently host up to 20 CPU sessions and 12 GPU sessions, concurrently. Access to the cluster is restricted via our Remote Visualization Service web interface, so that users are not allowed to connect directly via ssh to the login or compute nodes of Robin.
Each session on Robin provides 12 virtual CPUs and 64 GB of RAM, with GPU sessions having access to a shared Nvidia A30 GPU (up to 4 sessions can share a single GPU). Robin mounts the Raven file systems, providing access to all the software and data available on the Raven cluster, including the user’s home directory and ptmp folder. A runtime of up to 7 days is currently allowed, with a plan to increase to up to 28 days of maximum runtime in the future, but users are encouraged to stop their sessions once their calculations are completed and should be aware that long-running jobs can be killed in case of maintenance of the cluster.
Users requesting GPU sessions are encouraged to limit the memory used by their code to roughly 1/4 of the available GPU memory (~6 GB out of the 24 GB available), in order to avoid disrupting the calculations of other users sharing the same GPU. This is particularly important for Machine Learning software (e.g. Tensorflow, Pytorch) that can allocate the entire available GPU memory for a single process.
Robin is designed to provide a single solution for the remote visualization needs of future HPC clusters at MPCDF: the filesystem of new clusters (like the upcoming Viper) can be made available on Robin, providing easy access to software and data without the need of a dedicated installation of the Remote Visualization Service on each cluster.
Users interested in using the Remote Visualization Service on Robin are reminded to initialize their sessions on the cluster once (before submitting their first session), as described in our documentation.
Michele Compostella
News & Events
AMD-GPU development workshop
In preparation for the new supercomputer Viper of the MPG with AMD MI300A GPUs to be installed in 2024, the MPCDF in collaboration with AMD offered an online course on AMD Instinct GPU architecture and the corresponding ROCm software ecosystem, including the tools to develop or port HPC or AI applications to AMD GPUs. The workshop was held as an online event, spanning three afternoons on November 28-30, 2023. The workshop material can be found on the MPCDF training website.
Tilman Dannert
Meet MPCDF
The monthly online-seminar series “Meet MPCDF” skips the talk in January 2024. The next edition will take place on February 1st, 2024 with a talk on “ScaLAPACK and ELPA: how to diagonalize really large dense matrices” given by Petr Karpov from the MPCDF. Subsequent dates will be March 7th and April 4th (topics to be announced). All announcements and material can be found on our training webpage.
We encourage our users to propose further topics of their interest, e.g. in the fields of high-performance computing, data management, artificial intelligence or high-performance data analytics. Please send an E-mail to training@mpcdf.mpg.de.
Tilman Dannert
RDA Deutschland Tagung 2024
The German chapter of the Research Data Alliance will have its next conference in Potsdam, February 20-21, 2024. This year’s focus is on legal, administrative and organizational topics concerning research data management in Germany and Europe. The early registration deadline is January 12, 2024. Further details including the program are available from https://indico.desy.de/event/42727/. As in previous years, MPCDF is contributing to the organization of the event.
Raphael Ritz