
No.217, December 2024
Contents
High-performance Computing
AlphaFold3 available on Raven
Recently, Google DeepMind released the source code of AlphaFold3, shortly after Demis Hassabis and John Jumper from the same company had received half of the Nobel Prize in Chemistry 2024 for the development of AlphaFold2. AlphaFold3 extends the capabilities of AlphaFold2 by enabling predictions of interactions of biomolecules in addition to inferring their structure.
AlphaFold3 is now available on Raven, complementing installations of AlphaFold2
that have been provided and regularly updated since summer 2021. To get
started, execute the command module help alphafold/3.0.0
on Raven and follow
the instructions.
An important difference to AlphaFold2 is that the AI model behind AlphaFold3 is not publicly available. Users must register with Google DeepMind and download their personal copy after approval. By default, for the software installation provided by MPCDF, the weights file ‘af3.bin’ has to be placed into the home directory at ‘~/alphafold_3_0_0/model/’. It is the responsibility of the user to comply with the terms of use of the AI model.
The MPCDF is interested in getting feedback from users on the usability and performance of AlphaFold3 on the A100 GPUs of Raven. The memory requirements of AlphaFold3 are higher than those of AlphaFold2, and therefore out-of-memory conditions are more likely to happen. The scripts provided by our installation try to mitigate this situation by enabling CUDA unified memory for the inference step, logically extending the GPU memory with host memory.
Klaus Reuter
Resource limits on the HPC machines
In order to maintain the responsiveness of the login nodes on the HPC machines, per-user resource limits were introduced on Raven and also on Viper earlier this year. The per-user limit is currently two cores on raven01/02 and viper01/02 and 6 cores on raven03/04 and viper03/04, respectively. A hard memory limit is also enforced, which is 10% of the available memory on the first two login nodes of Raven and 50% of the available memory for the login nodes 3 and 4 of Raven and all login nodes of Viper. The following table summarizes these limits.
raven01/02 | raven03/04 | viper01/02 | viper03/04 | |
---|---|---|---|---|
cores | 2 | 6 | 2 | 6 |
memory | 50 GB | 256 GB | 256 GB | 256 GB |
As a consequence, this limits running multi-threaded or distributed jobs (which is the intention), but it may also affect
the (parallel) performance of the build procedures of large HPC codes. Usually the builds are done in parallel with the build system spawning threads (make -j
or
cmake --build . --parallel
are typical examples). The number of threads spawned should be limited in the build procedure to the number of
cores available (2 or 6) by passing this number to the build command
(make -j 6
or cmake --parallel 6
).
It is important to note that these resource limits also apply to CI jobs executed by a GitLab runner which has been launched by the user on the login nodes. Hence, also such runners should take the above mentioned resource limits into account when launching parallel builds, otherwise the build jobs may slow down significantly.
There are the following options to increase the build performance for such CI jobs:
Move your GitLab runners which use local builds from raven01/02 and viper01/02 to raven03/04 and viper03/04 and restrict the build procedure’s parallelism to 6.
You can keep your runners on the first two Raven nodes, but submit the build job into the “interactive” queue via the Slurm system. There you can use up to 8 cores for your job and hence use 8 threads for the build procedure.
salloc --partition=interactive -n 1 --cpus-per-task=8 --time=00:20:00 --mem=32G srun <build command>
Change your build tests to use the shared runners of our GitLab instance.
Tilman Dannert
HPC monitoring on Viper
The MPCDF is running a comprehensive performance monitoring system on the HPC systems that allows support staff as well as users to check on a plethora of performance metrics of compute jobs. Recently, the system was deployed to the Viper supercomputer. Unlike Raven and previous HPC systems, Viper features AMD EPYC Genoa CPUs with Zen4 cores that have somewhat less-capable Performance Monitoring Units (PMUs). For instance, while on Intel-based CPUs, GFLOP rates can be obtained individually for each precision and vector width, only total GFLOP rates independent of the precision can be measured on the AMD Zen4 processors. Similarly, the support for uncore events such as the memory bandwidth is still limited, but expected to continuously improve with more recent kernel versions. Users can access their performance data under these limitations. We’re working on improving the support for the Viper system over time.
Klaus Reuter
Routine transition to a new set of CI module images in 2025
Since late 2023 the MPCDF has been providing Docker images with software stacks that are installed in an essentially identical fashion on the HPC systems, enabling users to test their software consistently with various compiler and library toolchains on the GitLab shared CI runners. Interested readers can find the full announcement in Bits & Bytes No. 214, December 2023.
We would like to remind the CI users of the strategy we are employing to tag and manage these CI images. Starting with the year 2025, the images tagged with ‘2024’ will not receive any updates and will hence stay unchanged. At the same time, we will start with a new set of images tagged ‘2025’ (then identical to ‘latest’) that contain more recent software. Users can find up-to-date lists of the available images and the software therein here. Please note that users do not need to take action unless they want to access more recent software stacks for their CI tests.
Klaus Reuter, Tobias Melson
Checks for uninitialized variables disabled in latest Intel Fortran compiler
Many Fortran developers rely on correctness-checking capabilities of the compiler, for example
in their CI pipelines or other non-regression-checking and debugging strategies.
The Intel Fortran compilers, for example, by using the option -check
, can instrument an
executable with various runtime checks including the commonly used array-bounds check
(-check bounds
), or uninitialized-variables check (-check uninit
). With the latter option,
however, the new Intel compiler, ifx, is producing false positives, in particular in combination
with MPI, which is why Intel decided to disable the -check uninit
option.
Note also, that -check all
effectively translates to -check all,nouninit
in the
latest ifx (2025) version which may be perceived as a silent relaxation of the overall
strictness.
In general, we recommend to use the following set of options for enabling a restrictive set of runtime checks for Intel and GNU Fortran compilers, respectively, which includes an effective check for uninitialized variables by pre-setting variables with “nan” and catching resulting floating-point exceptions.
ifx -g -traceback -check all -fpe0 -init=arrays -init=snan
gfortran -g -Wall -fcheck=all -finit-real=snan -ffpe-trap=invalid,zero,overflow
Note, however, that some of these options can significantly increase the execution time of the generated executable and hence these should only be used for debugging and correctness checking.
Markus Rampp
Events
International HPC Summer School 2025
The International HPC Summer School (IHPCSS) 2025 will take place from July 6th to July 11th in Lisbon, Portugal. The series of these annual events started 2010 in Sicily, Italy and provides advanced HPC knowledge to computational scientists, focusing on postdocs and PhD students. Through the participation of Canada, the USA, South Africa, Japan, Australia and Europe a truly international group of highly motivated students is meeting each year.
Interested students and postdoctoral fellows should monitor the school’s website (https://ss25.ihpcss.org) where registration opens on December 16th. School fees, travel, meals and housing will be covered for all accepted applicants through funds from the European Union and EuroHPC. For further information and application, please visit the website of the summer school.
Erwin Laure
Introduction to MPCDF services
The next issue of our semi-annual seminar series on the introduction to our services will be given on May 15th, 14:00-16:30 online. Topics comprise login, file systems, HPC systems, the Slurm batch system, and the MPCDF services remote visualization, Jupyter notebooks and DataShare, together with a concluding question & answer session. No registration is required, just connect at the time of the workshop via the zoom link given on our webpage.
Meet MPCDF
The next editions of our monthly online-seminar series “Meet MPCDF” are scheduled for
February 6th, 15:30 “Introducing the Viper GPU system with AMD MI300A APUs”
March 6th, 15:30 Topic to be announced
April 3rd, 15:30 Topic to be announced
All announcements and material can be found on our training webpage and the “Meet MPCDF” invitations will be sent to the all-users mailing list.
We encourage our users to propose further topics of their interest, e.g. in the domains of high-performance computing, data management, artificial intelligence or high-performance data analytics. Please send an E-mail to training@mpcdf.mpg.de.
Tilman Dannert