Staging Files to HPC systems via Globus Online

Large datasets can be staged to and from the HPC systems or Linux Clusters using Globus Online

Globus Online is a third party data transfer service which can be used to transfer large data volumes in a fast and reliable manner (For more info see: www.globus.org).

General information about Globus Online registration and usage of MPCDF Globus Services and License can be found here: MPCDF DataHub and Globus Online

The best method for transfering data via Globus to/from MPCDF depends on the configuration of the external site.

  1. If the external site has a Globus Online endpoint then deploying a Globus Connect Personal Client on an MPCDF cluster and directly moving data is often the simplest and best option.

    Server (External Institute) <------> Client (MPCDF HPC).
    
  2. If the external site does not have a Globus Connect Server Endpoint then data can be transfered by deploying a Globus Connect Personal Client on the external site and on the MPCDF cluster and performing a client-to-client transfer.

    Client (External Institute) <----> Client (MPCDF HPC).
    

    Access to parts of the filesytem other than the user’s home directory require the client config file to be edited (see below).

    Note: client-to-client transfers require that the user is part of the Max Planck Computing and Data Facility Globus Connect Plus group. For more details see the follwing link

  3. If the external site does not have a Globus Connect Server endpoint and client-to-client transfers are not possible then data can be staged via the MPCDF DataHub Server.

    To use the MPCDF DataHub server as a staging server install Globus Connect Personal clients on the source and target systems and perform the transfer in two steps.

    Client (External Institute) <----> DataHub <------> Client (MPCDF HPC).
    

    Access to parts of the filesytem other than the user’s home directory require the client config file to be edited (see below).

    Note: the storage on the DataHub server is scratch based and your data will be regularly cleaned, please do not use this service for permanent data storage.

Options 1 and 2 are the simplest while option 3 provides an alternative in the event of problems with 1 or 2 (Client-to-Client transfers work well in general but if the clients are deployed behind firewalls transfers between the two will not be possible).

Configuring and using Globus Personal Clients:

Setting up clients is easy and you can setup numerous clients (on HPC systems, desktops, laptops). For more info see the Globus Online How To

For large transfers (to/from our HPC systems) we recomend that you start a screen session on the HPC systems and run the globus client within it.

By defaults Globus Connect Personal Clients only provide access to the users home area. To allow access to the large data filesystems (/ptmp on our HPC systems) you will need to edit the globus-connect-personal config file as follows.

Basically you need to edit the config file

vim ~/.globusonline/lta/config-paths 

to allow access to other filesystems

For example to add access to /ptmp on a HPC cluster (edit the config file as follows)

<user>@login02:~/.globusonline/lta> cat config-paths
~/,0,1
/ptmp/,0,1 

The config-paths file contains entries which have the following elements (see the Globus Online docs for more info).

<path>,<sharing flag>,<R/W flag>

A re-start of the client will be needed if it is running while the changes to the config file are made.

For more information see the docs