How to archive data
Overview
The MPCDF has installed a migrating filesystem on the archive-server called “archive”. Data written to this filesystem will automatically be moved from disk to tape to free space on disk when necessary or back from tape to disk when needed again by the user.
This service is open to all user of MPCDF.
Accessing the archive-server
You may log onto the server with your <userid> and kerberos-password using ssh:
ssh <userid>@archive.mpcdf.mpg.de
The ssh key fingerprints are:
0M7PnQy8+R9baOQM3zrpykQJrby0eqIKGkfbm2XBXj8 (RSA)
m4SvenGFe4J45oOfCDPjfBXtxpgpO8GEkyVnuXGVs5Q (ED25519)
Your HOME directory is located under /ghi/r/<initial>/<userid>
There is also a symbolic link /r pointing to /ghi/r, so in practice a user with ID smith would work with /r/s/smith (or /ghi/r/s/smith)
All data within users’ HOME directories will automatically be archived to tape. For further information see the section “Information about the operation and usage of this filesystem” below.
Archiving project directories
In addition to the users’ HOME directories described above, there are also project-specific directories which automatically archive files on tape. The project main directories are available under /r2ghi/proj and there is also a symbolic link /p pointing to it.
In case you have a new project which wants to use the archive server, please contact Manuel Panea.
Information about the operation and usage of this filesystem
Additional basic information for any user
Automatic archival to tape: The system regularly (usually every hour) copies all new files to tape. The copy on disk remains as long as there is enough space. When the filesystem gets full above a certain value, some files which have already been copied to tape will be wiped from disk, beginning with the largest files which have been unused the longest time.
Automatic retrieval from tape: If (by using some program or command) you access a file which has been migrated to tape, the file will automatically be transferred back from tape to disk. This of course implies a certain delay. The command will appear to hang, but it will just wait until the data is online and then continue.
Redundancy: Every file being migrated gets simultaneously written to two different tapes. In this way, in case of a tape failure while reading back the data from the first tape, the file can probably still be read from the second tape.
Optimal file size for efficiency: The system can only migrate files which are bigger than the disk block size, which for this filesystem is 1 MB (one megabyte). Files smaller than 1MB stay resident on disk, permanently occupying disk space and, what’s worse, making the total number of files grow so large that operations like scanning the filesystem for making backups become increasingly slow.
In addition: while files larger than 1 MB can be migrated, the system works efficiently only for file sizes larger than about 1 GB (one gigabyte). The reason is that reading or writing data to/from tape implies waiting for a tape drive to become available, then waiting for a tape to get mounted in the drive and then waiting for the tape to get rewinded/positioned. This can typically take several minutes. Once a tape is available and in position, the system can read or write data very fast. A 1 GB file can be read in under 10 seconds. Contrast this with reading 1 GB of data spread across 1000 files, each 1 MB in size, which would need at the very least 1000 tape-positioning operations, maybe also mounting several tapes (possibly hundreds!).
For these reasons, all users are kindly asked to keep the size of files stored on ‘/r’ and ‘/p’ filesystems within a range of about 1 GB (one gigabyte) to about 1 TB (one terabyte).
Maximum file size: The above recommendation (1 GB to 1 TB per file) is not a strict limit. A small quantity of files not within that range is still ok. BUT: files larger than 20 terabytes will not be migrated to tape. They will stay on disk and, in the event of a disk crash, they will be lost. Do not store files larger than 20 terabytes.
Disk quotas limiting the number of files stored are enabled on ‘/r’ and ‘/p’ filesystems.
On the ‘/r’ filesystem, a user quota is enabled. Usually, it is 100.000 files. You can check your user quota with /usr/lpp/mmfs/bin/mmlsquota command:archive:~ $ /usr/lpp/mmfs/bin/mmlsquota -u <YOUR_USER_NAME> hpss_ghi_r1ghi Block Limits | File Limits Filesystem Fileset type KB quota limit in_doubt grace | files quota limit in_doubt grace Remarks hpss_ghi root USR 384 0 0 0 none | 21 100000 120000 0 none r1ghi.rzg.mpg.de hpss_ghi MPIN USR no limits r1ghi.rzg.mpg.de where "hpss_ghi_r1ghi" is the name of '/r' filesystem. It can be obtained with the command 'df -h'. On the '/p' filesystem, the quota for the project is enabled. You can check the project quota with **/usr/lpp/mmfs/bin/mmlsquota** command: ```sh archive:~ $ /usr/lpp/mmfs/bin/mmlsquota -j <YOUR_PROJECT_NAME> hpss_ghi_r2ghi Block Limits | File Limits Filesystem type KB quota limit in_doubt grace | files quota limit in_doubt grace Remarks hpss_r2ghi FILESET 0 0 0 0 none | 1 100000 101000 0 none r2ghi.rzg.mpg.de
where “hpss_ghi_r2ghi” is the name of /p filesystem. It can be obtained with the command ‘df -h’. <YOUR_PROJECT> is the name of your project. For example, if your project directory is /p/NAME, then your project name is “NAME”. But if your project directory is /p/NAME/SUB, then your project name is “NAME_SUB”.
When your quota is exceeded, you get this message when trying to create a file or a directory:mkdir: cannot create directory ‘mydir’: Disk quota exceeded"""
If you have a lot of small files, please, pack them in a tar, zip or similar archive. The size of this archive should be bigger than 1 GB and smaller than 1 TB. If all your files are bigger than 1 GB and you still exceed the quota, please send us an email to MPCDF Support. We can then increase your quota.
Additional information for users directly logging onto the server using ssh
You can manually force the recall of a migrated file by using any command which opens the file. You can recall in advance all files needed by some job with a command like
file myfiles/*
or you can use ghi_stage command for that:
ghi_stage myfiles/*
You can see which files are resident on disk and which ones have been migrated to tape with the command ghi_ls (located in /usr/local/bin), optionally with the option -l. Here is a sample output:
archive% ghi_ls -l G -rw-r--r-- 1 ifw rzs 22 Nov 21 15:12 a1 H -rw------- 1 ifw rzs 138958551040 Sep 18 22:22 abc.tar H -rw-r--r-- 1 ifw rzs 1073741312 May 06 2009 core G -rw-r--r-- 1 ifw rzs 0 Jun 20 2008 dsmerror.log B -rw-r--r-- 1 ifw rzs 1079040000 Aug 03 2010 dummyz3
The first column states where the file resides: a ‘G’ means the file is resident on the GPFS disk; a ‘H’ means the file has been transferred to the underlying HPSS archiving system, probably on tape; a ‘B’ means ‘both’: the file has already been copied to HPSS but is still present on disk and can be removed immediately if the system needs to free disk space.
If you have many small files, please pack them first together to a large file with a suitable tool like tar, cpio, ar, zip or whatever. Please try to keep the size of files stored on the ‘/r’ and ‘/p’ filesystem within a range of about 1 GB (one gigabyte) to about 1 TB (one terabyte).
Here is a simple example of how to use tar to pack some small files small000, small001, etc to a big file big.tar:
tar cvf big.tar small*
Additionally, if you want, you can write a “Table of Contents” file with a command like
tar tvf big.tar > big.tar.toc
Files with a ‘.toc’ extension will stay on-line, provided they are smaller than 1 MB, so you can read them any time without having to wait for a tape to be mounted. Likewise, files with a ‘.md5’ extension also stay on-line if they are smaller than 1 MB.
Please pay attention when working with sparse files. Sparse files are files which contain stripes of zeros and these zeros are not stored on disk. Therefore the disk usage (obtained with du -sh command) for such files is smaller than their actual size (obtained with ls -l command).
When packing these files in a tar archive, the disk usage of the resulting tar file will be bigger than disk usage of source files. tar by default writes all zeros explicitly on disk. You can use -S (–sparse) option for tar. In this case, tar handles sparse files efficiently. Disk usage of this tar archive is as small as disk usage of the source files. To extract the files, -S option is not needed. The extracted files will be sparse again. In contrast to using zip or other archiving tool with compression. When compression sparse files with zip, the resulting archive is small but extracted files will not be sparse anymore.
A problem can arise when using tar -S to create a tar archive from the source files stored on ‘/r’ or ‘/p’ filesystems. If your source files are migrated to tape and purged from disk (ghi_ls -l outputs ‘H’) then, tar without -S option will recall files from tape before creating a tar archive whereas tar with -S option will not. In this case, an empty tar archive is created and no error message is shown! To avoid data loss, always check that source files are on disk (ghi_ls -l outputs ‘G’ or ‘B’). If source files are on tape only (ghi_ls -l outputs ‘H’), then recall these files first for example, using ghi_stage command.