Publishing Data for public access via S3
Projects are free to set public access for download when needed. S3 command line clients such as s3cmd and minio-client support setting objects to be publicly available.
For example, you can set public access by using s3cmd:
s3cmd setacl --acl-public --recursive s3://public-bucket
Objects may be returned to private access by using:
s3cmd setacl --acl-private --recursive s3://public-bucket
Note: As new object are added the acess acl for these needs to be updated (this is not inherited from the bucket)
We advise against public access for upload since this would open the S3 storage to abuse.
A few things to note:
Be aware that some clients set READ/WRITE access when you use the “public” option. e.g. for the minio-client:
mc anonymous set public storage/public-bucket
will actually allow anonymous writes as well as reads.Any content you make public readable is likely to be crawled be google etc
Publishing data
When publishing a data set it is advisble to provide a landing page with basic information about the data. This may be achieved by using the MPCDF metastore or by creating a landing page within the public bucket.
When using metastore, DataCite compatible metadata can be associated with the dataset which may be made available as links to the S3 objects. MetaStore makes the Findable as defined in FAIR data.
When creating a stand alone landing page within the S3 Bucket it is advisable to:
Create an index.html page within the bucket
Describe the dataset within the index.html page (origin, owners, size etc)
Add a link to each object (including a checksum or a separate chceksum file)
Provide basic information about how the objects can be downloaded (e.g. via curl, wget)
Digital Object Identifiers (DOIs) for published data
Digital Object Identifiers provide a persistent identifier for datasets which makes the data addressable and allows the underlying dataset to be moved in a transparant manner where the end users are simply re-directed to the new location.
A DOI may be obtained via metastore or directly from the MPDL MPDL-DOI.
Temporary Sharing:
You can give temporary access to data via presigned URLs. These allow you to generate a short-lived URL that has an obscure form and a configurable lifetime. These URLs may safely be passed to data users to retrieve individual objects.
More information about temporary file sharing can be found here