GitLab Runners for CI/CD

Introduction

If a GitLab repository contains a continuous integration pipeline, its jobs will be executed via a GitLab Runner. A GitLab Runner is a daemon running on an other server, waiting to be contacted by the central GitLab server to execute CI pipelines.

You can find more detailed information about GitLab Runners in the GitLab Documentation.

There are two different ways of using a GitLab runner:

individual runners, installed on local machines, remote clusters, or cloud-based systems by individual users (without support from MPCDF)
shared runners, which are offered by MPCDF

If you don’t want to install and configure an individual GitLab runner, you can execute your CI pipelines on the shared runners offered by MPCDF. If you don’t add one or more tags to your CI file .gitlab-ci.yml, your pipeline will automatically executed by a shared runner.

Tags

Sometimes, your pipeline needs to be executed on a runner with specific capabilities. Via tags you can specify which runner should be used to execute your CI pipeline:

default:
  tags:
    - nvidia-cc80

Important tags for using the shared runners are:

The tags for the Nvidia GPU runners are cloud-gpu and nvidia-cc80
The tag for the AMD GPU runners is amd-mi200

The tag system of the shared runners is currently overhauled. Once the new tagging system is in place, detailed information about how to use it can be found here.

Docker images for CI with MPCDF environment modules

To provide the developers of HPC applications with a familiar and comprehensive software environment also within GitLab-based continuous integration (CI) pipelines, the MPCDF is offering special Docker images. These images use environment modules to make software accessible, in a very similar way to how software is managed on the HPC systems. Hence, e.g. build scripts will work on both the HPC systems and the CI cloud runners in a consistent way.

Images and tags

The new software infrastructure is composed of various Docker images, each of which provides a software stack based on a single combination of a compiler (and potentially MPI) variant. For the user the access to the software is enabled via environment modules. Currently, the images are based on openSUSE Leap 15.5 which is largely compatible with the SLES 15 operating system used on many HPC clusters at MPCDF. Up-to-date lists of the images together with lists of the software contained are documented in GitLab. Please note that you need to opt-in and login to GitLab before you can access this page.

As indicated by its tag, each image only contains a single toolchain, namely a single compiler with optionally a single MPI library plus a selection of widely used additional libraries. The list of software may be extended upon request. Arbitrary further software from the official OpenSUSE repos may be installed by the users individually by deriving from the MPCDF images, if necessary.

Image tagging-and-purging strategy

Tagging using `latest` and the calendar year

To limit the individual growth of these Docker images over time, we put the following tagging-and-purging strategy in place:

Essentially, all images are tagged using latest and/or the calendar year. In the course of a year, say 2024, the images tagged with latest and the year (2024) are identical and receive regular updates and additions of software. With the beginning of the new year, all images tagged with the previous year stay unchanged (frozen). The newly created images for 2025, say, will start out in early January again in a slim state and will be tagged latest. Users can then choose to migrate to the more recent images (tagged 2025 and latest in our example) or stick with the older (but static!) images (tagged 2024) for a while.

In case a user opts for using the tag latest, please be warned that the software environment will likely change at the beginning of each year.

Non versioned images tagged `latest`

Moreover, we provide special images without an explicit version number pointing to the respective most recent compiler and MPI in the MPCDF software stack. For the compilers, for example, we offer the images gcc:latest and intel:latest, similarly for depending images containing Intel- or OpenMPI. See the page on the latest image tag for an overview. A typical use case for these images is a user’s code which should be built and tested with the newest available compiler. Only the tag latest exist for these non-versioned images. In order to seamlessly receive updates, the module load command should also omit the compiler’s and MPI’s version number in this case (for example, using module load gcc instead of module load gcc/13).

Case Study: Set up a CI job using a recent Intel C++ compiler

This section shows how to set up a CI job to compile and test some C++ software using the Intel compiler. The required steps are:

Identify the Docker image that provides the required software by checking the CI Docker image website. Copy the image tag, in our example we’re using gitlab-registry.mpcdf.mpg.de/mpcdf/ci-module-image/intel_2023_1_0_x:2024.
Edit your .gitlab-ci.yml file and paste the image tag after image:.
Select proper tags for the shared runners you’re intending to use.
The resulting .gitlab-ci.yml file might look as follows:

# .gitlab-ci.yml
build_intel:
  image: gitlab-registry.mpcdf.mpg.de/mpcdf/ci-module-image/intel_2023_1_0_x:2024
  tags:
    - cloud
    - 1core
  script:
    - module load intel/2023.1.0.x
    - module load gcc/13
    - module load cmake
    - icpx --version
    # compile C++ code and perform tests ...

Migrating from the legacy `module-image` to the new CI module images

Users of the previous module-image are encouraged to migrate to the new CI images now, report potential issues and request additional software modules via the helpdesk, if necessary. As shown in the previous section, the module-image can simply be replaced in the user’s .gitlab-ci.yml file with one of the new images that provides the desired software stack for the respective CI job.

The new CI module images were first introduced in the December 2023 edition of the Bits and Bytes.

Shared runners offered on MPCDF GitLab

The following paragraphs describe the shared runners which are currently available on MPCDF GitLab. Some parameters are valid for all of them:

The execution engine is Docker. For security reasons, it is not possible to execute shell scripts directly on the runners
The default Docker image is “python:3.12”

MPCDF Cloud Runner 01

4 vCPUs, 32 GB RAM
Tags: shared, docker, cloud, modules, avx, avx2, avx512, distributedcache

MPCDF Cloud Runner 02

4 vCPUs, 32 GB RAM
Tags: shared, docker, cloud, modules, avx, avx2, avx512, distributedcache

MPCDF Cloud Runner 04

4 vCPUs, 32 GB RAM
Tags: shared, docker, cloud, modules, avx, avx2, avx512, distributedcache

MPCDF Cloud Runner 05

4 vCPUs, 32 GB RAM
Tags: shared, docker, cloud, modules, avx, avx2, avx512, distributedcache, 1core, version1650

MPCDF Cloud Runner 06

4 vCPUs, 32 GB RAM
Tags: shared, docker, cloud, modules, avx, avx2, avx512, distributedcache, 1core

MPCDF Cloud Runner 07

8 vCPUs, 16 GB RAM
Tags: 8cores

MPCDF Cloud Runner 08

4 vCPUs, 32 GB RAM
Tags: docker, shared, cloud

MPCDF Cloud Runner 09

4 vCPUs, 32 GB RAM
Tags: docker, shared, cloud

MPCDF GPU Runners

Nvidia

MPCDF-GPU-01
MPCDF-GPU-02
MPCDF-GPU-03
MPCDF-GPU-04

Resources:

4 vCPUs, 16 GB RAM each
Tags: cloud-gpu, modules, nvidia-cc80, distributedcache

Each of these 4 GPU runners has access to a MIG partition which exposes about 50% of one Nvidia A30 GPU. Executing the command nvidia-smi in a CI pipeline describes the available GPU resource:

(...)
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 535.129.03    Driver Version: 535.129.03    CUDA Version: 11.6   |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA A30          On   | 00000000:00:05.0 Off |                   On |
| N/A   36C    P0    60W / 165W |                  N/A |     N/A      Default |
|                               |                      |              Enabled |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| MIG devices:                                                                |
+------------------+----------------------+-----------+-----------------------+
| GPU  GI  CI  MIG |         Memory-Usage |        Vol|         Shared        |
|      ID  ID  Dev |           BAR1-Usage | SM     Unc| CE  ENC  DEC  OFA  JPG|
|                  |                      |        ECC|                       |
|==================+======================+===========+=======================|
|  0    2   0   0  |     13MiB / 11968MiB | 28      0 |  2   0    2    0    0 |
|                  |      0MiB / 16383MiB |           |                       |
+------------------+----------------------+-----------+-----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

AMD

MPCDF-GPU-05