The MPCDF Metadata Tools: User Documentation
The MMD Tools (short for _MPCDF Metadata Tools_) can be used to create and manage metadata in several common metadata schemata.
The mmd tools consists of four callable scripts and one module which provides additional functionality. The tools should be available on MPCDF systems but could also installed easily on most other Linux systems.
The mmd tools should run on every system with a reasonably modern version of Python 3. We recommend using the Anaconda Python Distribution which is also available in MPCDF’s module system on the HPC clusters. The following procedure installs the mmd tools into a new virtual environment utilizing Anaconda Python on one of the HPC clusters.
At first, load the Anaconda Python Distribution via the module system:
module load anaconda/3/2021.11
You can check the availability of Anaconda via:
Now create a new Python virtual environment for the mmd tools (you can give it another name of course):
conda create --name mmd
Activate the new created environment:
conda activate mmd
The prompt of your shell should now change the name of the new created environment.
Next, clone the GitLab repository:
git clone email@example.com:mmd/mmd-tools.git
Change to the new created directory and install the mmd tools together with all necessary Python libraries:
pip install .
The mmd tools are now ready to be used and should be accessible via shell completion. Try it via entering “mmd” and press the tab key - it should list all available mmd tools:
(mmdtest) thomz@cobra02:~> mmd
mmd mmd2bagit mmdCreate mmdListBags mmdLoad mmdPublish mmdShow
Let’s take a look at the individual tools of the mmd suite.
The mmdCreate script can be used to create and edit metadata manually. As parameter, it needs an output file (parameter o) and a metadata format specification. These specifications can be found in the subfolder “formats” of the cloned repository. Please specify the path to the format definition as absolute or relative path:
mmdCreate.py -o ~/metadata.mmd --format /data/mmd/formats/dublinCore.json
With the command above, you can create a metadata file in the well known DublinCore format 1. The script will guide you step by step through the necessary fields:
Metadata format: /data/mmd/formats/dublinCore.json
Fill in each field. Type "?" for a description of the field.
After the script guided you through the process of entering the metadata, you can find the result in the output file specified via the “-o” option. The file is in JSON format and can be displayed or further processed by the common JSON tools or libraries.
Once you have created a metadata file via the mmdCreate script, you can display its content via the mmdShow script. The parameter “-i” takes the input file:
python3 mmdShow.py --i /tmp/metadata.mmd
Optional, the parameter –outputformat can be set to “html” so that the printed output will be formated as simple HTML code.
With the mmd2bagit script, you can combine a folder with your data files and its metatada description in mmd format into a BagIt container 2:
python3 mmd2bagit.py --folder ~/testdata/ --metadata /tmp/metadata.mmd
Please be aware that the script changes the structure of the input folder! All content of the folder will move to the “data” subfolder while on the top level, you can find some additional files which were created by the script!
The mmdPublish script can be used to publish a metadata file into a CKAN instance. Before you can use the script, you need to create an access token in CKAN and store it into an environment variable:
Without this access token, the script can not write into the CKAN instance. Please make sure that the environment variable can not be read by unauthorized people!
The script itself needs several parameters:
i: the input file in mmd metadata format
c: the URL of the CKAN instance, followed by the path to its API. For example: https://ckanexample.com/api/3/action/
t: the field in the metadata corresponding to the “Title” field in CKAN (not the title itself!)
o: the CKAN organisation under which the dataset should be stored
TODO: Screenshots of the whole workflow!
The Metadata Formats
The basic idea of the MMD tools is to be as flexible as possible when it comes to the creation and management of metadata. Therefore, the MMD Tools are working schemaless with plain pairs of keys and values. Additional, some common metadata formats and schemata from our users are supported.
If you need support of further metadata schemata, please contant the developers via firstname.lastname@example.org
The integrated metadata formats are stored in a JSON based format and can be found in the subfolder “formats” of the GitLab repository. So far, the following schemata are included:
DataCite Metadata Format
MPCDF default metadata schema