How to archive data

Overview

The MPCDF has installed a migrating filesystem on the archive-server called “archive”. Data written to this filesystem will automatically be moved from disk to tape to free space on disk when necessary or back from tape to disk when needed again by the user.

This service is open to all user of MPCDF.

Accessing the archive-server

You may log onto the server with your <userid> and kerberos-password using ssh:

ssh <userid>@archive.mpcdf.mpg.de

The ssh key fingerprints are:

0M7PnQy8+R9baOQM3zrpykQJrby0eqIKGkfbm2XBXj8 (RSA)

m4SvenGFe4J45oOfCDPjfBXtxpgpO8GEkyVnuXGVs5Q (ED25519)

The AFS-filesystem is not mounted on the archive-server. You have a home-directory in the following subdirectory: /ghi/r/<initial>/<userid>

All data within these user homedirectories will automatically be archived to tape. For further information see the section “Information about the workings and usage of this filesystem” below.

There is also a symbolic link /r pointing to /ghi/r, so in practice a user with ID smith would work with /r/s/smith (or /ghi/r/s/smith)

Archiving project directories

In addition to the user homedirectories described above, there are also project-specific directories which automatically archive files on tape. The project main directories are available under /r2ghi/proj and there is also a symbolic link /p pointing to it.

In case you have a new project which wants to use the archive server, please contact Manuel Panea.

Information about the workings and usage of this filesystem

Additional basic information for any user

  • The system constantly monitors the fillage of the filesystem. When the filesystem gets full above a certain value, files will be transferred from disk to tape, beginning with the largest files which have been unused the longest time.

  • If (by using some program or command) you access a file which has been migrated to tape, the file will automatically be transferred back from tape to disk. This of course implies a certain delay. The command will appear to hang, but it will just wait until the data is online and then continue.

  • Every file being migrated gets simultaneously written to two different tapes. In this way, in case of a tape failure while reading back the data from the first tape, the file can probably still be read from the second tape.

  • The system can only migrate files which are bigger than the disk block size, which for this filesystem is 1 MB (one megabyte). Files smaller than 1MB stay resident on disk, permanently occupying disk space and, what’s worse, making the total number of files grow so large that operations like scanning the filesystem for making backups become increasingly slow.
    In addition: while files larger than 1 MB can be migrated, the system works efficiently only for file sizes larger than about 1 GB (one gigabyte). The reason is that reading or writing data to/from tape implies waiting for a tape drive to become available, then waiting for a tape to get mounted in the drive and then waiting for the tape to get rewinded/positioned. This can typically take several minutes. Once a tape is available and in position, the system can read or write data very fast. A 1 GB file can be read in under 10 seconds. Contrast this with reading 1 GB of data spread across 1000 files, each 1 MB in size, which would need at the very least 1000 tape-positioning operations, maybe also mounting several tapes (possibly hundreds!).
    For these reasons, all users are kindly asked to keep the size of files stored on ‘/r’ and ‘/p’ filesystems within a range of about 1 GB (one gigabyte) to about 1 TB (one terabyte).

  • Disk quotas limiting the number of files stored are enabled on ‘/r’ and ‘/p’ filesystems.
    On the ‘/r’ filesystem, a user quota is enabled. Usually, it is 100.000 files. You can check your user quota with /usr/lpp/mmfs/bin/mmlsquota command:

    archive:~ $ /usr/lpp/mmfs/bin/mmlsquota -u <YOUR_USER_NAME> hpss_ghi_r1ghi
                             Block Limits                                               |     File Limits
    Filesystem Fileset    type             KB      quota      limit   in_doubt    grace |    files   quota    limit in_doubt    grace  Remarks
    hpss_ghi   root       USR             384          0          0          0     none |       21  100000   120000        0     none r1ghi.rzg.mpg.de
    hpss_ghi   MPIN       USR         no limits                    r1ghi.rzg.mpg.de
    

    where “hpss_ghi_r1ghi” is the name of ‘/r’ filesystem. It can be obtained with the command ‘df -h’.
    On the ‘/p’ filesystem, the quota for the project is enabled. You can check the project quota with /usr/lpp/mmfs/bin/mmlsquota command:

    archive:~ $ /usr/lpp/mmfs/bin/mmlsquota -j <YOUR_PROJECT_NAME> hpss_ghi_r2ghi
                             Block Limits                                    |     File Limits
    Filesystem type             KB      quota      limit   in_doubt    grace |    files   quota    limit in_doubt    grace  Remarks
    hpss_r2ghi FILESET           0          0          0          0     none |        1  100000   101000        0     none r2ghi.rzg.mpg.de
    

    where “hpss_ghi_r2ghi” is the name of /p filesystem. It can be obtained with the command ‘df -h’. <YOUR_PROJECT> is the name of your project. For example, if your project directory is /p/NAME, then your project name is “NAME”. But if your project directory is /p/NAME/SUB, then your project name is “NAME_SUB”.
    When your quota is exceeded, you get this message when trying to create a file or a directory:

    mkdir: cannot create directory ‘mydir’: Disk quota exceeded"""
    

    If you have a lot of small files, please, pack them in a tar, zip or similar archive. The size of this archive should be bigger than 1 GB and smaller than 1 TB. If all your files are bigger than 1 GB and you still exceed the quota, please send us an email to MPCDF Support. We can then increase your quota.

Additional information for users directly logging onto the server using ssh

  • You can manually force the recall of a migrated file by using any command which opens the file. You can recall in advance all files needed by some job with a command like

    file myfiles/*
    

    or you can use ghi_stage command for that:

    ghi_stage myfiles/*
    
  • You can see which files are resident on disk and which ones have been migrated to tape with the command ghi_ls (located in /usr/local/bin), optionally with the option -l. Here is a sample output:

    archive% ghi_ls -l
    
    G -rw-r--r--   1  ifw    rzs             22 Nov 21 15:12 a1
    
    H -rw-------   1  ifw    rzs   138958551040 Sep 18 22:22 abc.tar
    
    H -rw-r--r--   1  ifw    rzs     1073741312 May 06 2009  core
    
    G -rw-r--r--   1  ifw    rzs              0 Jun 20 2008  dsmerror.log
    
    B -rw-r--r--   1  ifw    rzs     1079040000 Aug 03 2010  dummyz3
    

    The first column states where the file resides: a ‘G’ means the file is resident on the GPFS disk; a ‘H’ means the file has been transferred to the underlying HPSS archiving system, probably on tape; a ‘B’ means ‘both’: the file has already been copied to HPSS but is still present on disk and can be removed immediately if the system needs to free disk space.

  • If you have many small files, please pack them first together to a large file with a suitable tool like tar, cpio, ar, zip or whatever. Please try to keep the size of files stored on the ‘/r’ and ‘/p’ filesystem within a range of about 1 GB (one gigabyte) to about 1 TB (one terabyte).

    Here is a simple example of how to use tar to pack some small files small000, small001, etc to a big file big.tar:

    tar cvf big.tar small*
    

    Additionally, if you want, you can write a “Table of Contents” file with a command like

    tar tvf big.tar > big.tar.toc
    

    Files with a ‘.toc’ extension will stay on-line, provided they are smaller than 1 MB, so you can read them any time without having to wait for a tape to be mounted. Likewise, files with a ‘.md5’ extension also stay on-line if they are smaller than 1 MB.

  • Please pay attention when working with sparse files. Sparse files are files which contain stripes of zeros and these zeros are not stored on disk. Therefore the disk usage (obtained with du -sh command) for such files is smaller than their actual size (obtained with ls -l command).
    When packing these files in a tar archive, the disk usage of the resulting tar file will be bigger than disk usage of source files. tar by default writes all zeros explicitly on disk. You can use -S (–sparse) option for tar. In this case, tar handles sparse files efficiently. Disk usage of this tar archive is as small as disk usage of the source files. To extract the files, -S option is not needed. The extracted files will be sparse again. In contrast to using zip or other archiving tool with compression. When compression sparse files with zip, the resulting archive is small but extracted files will not be sparse anymore.
    A problem can arise when using tar -S to create a tar archive from the source files stored on ‘/r’ or ‘/p’ filesystems. If your source files are migrated to tape and purged from disk (ghi_ls -l outputs ‘H’) then, tar without -S option will recall files from tape before creating a tar archive whereas tar with -S option will not. In this case, an empty tar archive is created and no error message is shown! To avoid data loss, always check that source files are on disk (ghi_ls -l outputs ‘G’ or ‘B’). If source files are on tape only (ghi_ls -l outputs ‘H’), then recall these files first for example, using ghi_stage command.