Sharing Large Files with DataShare

To enable large file transfers via DataShare we advise using rclone chunker. This recipe will focus on sharing data via a public link, however, rclone can also be configured to use a standard user account in DataShare.

Set up share folder in DataShare

  1. Create a new folder for the data in DataShare

  2. Via Sharing - Public Links, create a share with read/write permissions

Create Public Link

  1. Copy link to clipboard and paste into the text editor for your choice

  2. Extract the cryptic share token at the end of the url and save it for the the rclone configuration

Get Share Token

  1. Optionally repeat steps 2-5 to create another share with readonly permissions if recipient should only be able to download files

Sender: Upload files using rclone

  1. Configure rclone remote and chunking overlay.

> rclone config create testproject webdav url https://datashare.mpcdf.mpg.de/public.php/webdav/ user <sharetoken> pass <sharepass>
> rclone config create testproject-overlay chunker remote testproject: chunk_size 2G hash_type none

The default chunk_size of 2GB generally works fine. Can be increased up to 20GB if less chunks are desired.

However using very big chunks might cause problems with slow clients or network connections (also relevant during download).

Checksums can be enabled if desired (e.g. hash_type md5) but will of course take some additional time to calculate.

  1. Upload individual files or a whole directory

> rclone copy 5g testproject-overlay: --progress --transfers 1
Transferred:            5G / 5 GBytes, 100%, 52.979 MBytes/s, ETA 0s
Checks:                 3 / 3, 100%
Renamed:                3
Transferred:            1 / 1, 100%
Elapsed time:      1m41.6s

The --transfers 1 option ensures that only a single operation is running at a time.
Please make sure to always use it when doing chunked uploads to DataShare; multiple concurrent transfers can actually slow things down due to synchronization overhead and generate unnecessary load on the server.

Files on the server

On the server, the folder will look like this (5g.rclone_chunk.001, 5g.rclone_chunk.002…):

Chunked Files in DataShare Folder

The file with the original name (5g in this example) just contains some metadata (number of chunks, checksums if enabled). Data is split into chunks of <name>-<number>.

If desired, chunks can be downloaded via the web interface or curl and assembled manually e.g. with cat <name>-rclone_chunk-??? > <name>.

Recipient: Download files again using rclone

For larger data sets, setting up rclone on the recipient as well is recommended:

  1. Configure rclone remote and chunking overlay

> rclone config create testproject-readonly webdav url https://datashare.mpcdf.mpg.de/public.php/webdav/ user <sharetoken> pass <sharepass>
> rclone config create testproject-readonly-overlay chunker remote testproject-readonly:
  1. Download individual files or a whole directory

> rclone copy testproject-readonly:5g 5g-from-remote --progress
Transferred:            5G / 5 GBytes, 100%, 91.694 MBytes/s, ETA 0s
Transferred:            1 / 1, 100%
Elapsed time:       1m2.1s