LNGS decided to offer its users an object storage service for scientific data, with the idea of easing data archival and distribution for experiments and projects. Object storage is a computer data storage architecture that manages data as objects, as opposed to other storage architectures like file systems which manages data as a file hierarchy, and block storage which manages data as blocks within sectors and tracks. Each object typically includes the data itself, a variable amount of metadata, and a globally unique identifier.
Object storage enables capabilities not addressed by other storage architectures
interfaces that are directly programmable by the application (APIs) a namespace that can span multiple instances of physical hardware data-management functions like data (geo)replication and data distribution at object-level granularity authenticated, remote access Object Storage was designed to store very large amounts of unstructured data, i.e. data accessible with a unique ID and not through a complex structure with hierarchical folders. This structure allows you to access easily and very quickly all your static content such as photos, videos, text files, HTML and CSS web pages, etc. Access to your objects is performed using a standard HTTP API, meaning you can safely access your files easily from anywhere as authentication is usually required.
LNGS decided to offer its users an object storage service for scientific data, with the idea of easing data archival and distribution for experiments and projects.
Object storage data is accessed per project. Enrollment to an existing project or creation of a new project, that can be personal or linked to a working group, must be asked for to the LNGS “Servizio Calcolo e Reti”.
The Amazon S3 Application Programming Interface (S3 API) is the most common way in which data is stored, managed, and retrieved by object stores. At LNGS the S3 API is a frontend API on top of the OpenStack Swift storage object engine.
After you are granted access to the Object Storage Service, you need to have a personal access_key and a secret_key to manage your data. Each user can own multiple access/secret key pairs, one per each project she/he is member of. Once you have these credentials you will be able to communicate with LNGS Object Storage Service using the “language” of S3 and use our Object Storage solutions. S3 API will verify your credentials and translate your calls into Swift API in order to execute your requests.
To use the LNGS Object Storage Service S3 APIs, you need to get S3 credentials, an Access Key ID and a Secret Key that you can use to access data.
In order to obtain S3 credentials you must to know the name of the project for which you need to obtain them. If you are entitled to access data belonging to different projects you will have to get different set of credentials for each project. Credentials can be obtained by logging in into linux.lngs.infn.it
ssh myusername@linux.lngs.infn.it
And running the get_ec2_credentials command. This command will provide you with new credentials or, if they are already present, show the existing credentials.
stalio@linux106:~>get_ec2_credentials Please enter your project name: calcolo Please enter your LNGS password as user stalio: Access_key: <your_access_key> Secret_key: <your_secret_key>
You can revoke a set of credentials because you feel they are not secure anymore by logging in into linux.lngs.infn.it and running:
revoke_s3_credentials
For example:
stalio@linux105:~>revoke_s3_credentials Please enter your project name: calcolo Please enter your LNGS password as user stalio: Revoking credentials for project calcolo, you can still obtain new ones with "get_s3_credentials".
In order to verify your project quota on the LNGS object storage service an the actual usage, you can login into linux.lngs.infn.it and use the following command:
get_s3_usage
The –detail option will give detailed information on usage per single bucket
Interacting with the LNGS Cloud Storage can be done in multiple ways. There are S3 APIs and libraries exist for most programming languages, but there are also different clients that make data access easier and quicker. Following there are instructions for using some of the more popular S3 clients.
Rclone (https://rclone.org/) is a command line program to manage files on cloud storage. It is a feature rich alternative to cloud vendors' web storage interfaces. Over 40 cloud storage products (https://rclone.org/#providers) support rclone, including S3 object stores, business & consumer file storage services, as well as standard transfer protocols.
Rclone can be configured via environment variables. Suppose you are calling you remote storage “mydata”. Then you'll need the following variables:
export RCLONE_CONFIG_MYDATA_TYPE=s3 export RCLONE_CONFIG_MYDATA_ACCESS_KEY_ID=<your_access_key> export RCLONE_CONFIG_MYDATA_SECRET_ACCESS_KEY=<your_secret_key> export RCLONE_CONFIG_MYDATA_ENDPOINT=https://s3.lngs.infn.it
Alternatively rclone can be configured on first usage using its own internal configuration tool. Test that the configuration is correct by running
rclone ls mydata:
Full documentation on rclone command line usage can be find in the man pages or on the https://rclone.org/ site.
You can also access your S3 storage area as a shared folder on your personal computer.
mkdir lngs_data rclone mount lngs: lngs_data
This is very much like a windows shared folder or a linux nfs mount, except it can be accessed from anywhere in the world. Keep in mind that you can't treat this folder exactly like a local folder on your laptop: this is much slower and will fail with certain operations, e.g. when cloning large repositories with thousands of files from git. The correct usage for a folder like this is to keep mostly static data for remote access.
Installation instructions for the AWS client can be found here:
Edit the .aws/config file so that it looks like this:
[default] region = us-east-1
Edit the .aws/credentials file creating a default profile
[default] aws_access_key_id = <your_access_key> aws_secret_access_key = <your_secret_key>
Or adding a new one:
[default] aws_access_key_id = <another_access_key> aws_secret_access_key = <another_secret_key> [lngs] aws_access_key_id = <your_access_key> aws_secret_access_key = <your_secret_key>
There is no way of defining the endpoint url in a configuration file, so it must be passed as a command line argument.
aws s3 --profile lngs --endpoint-url=https://s3.lngs.infn.it ls
Full documentation on aws s3 command line usage can be obtained by running
aws s3 help
CyberDuck is a graphical client to different storage backends, including S3. It is availabe on Windows and MacOS. Setting it up to access your LNGS S3 storage is straightforward:
Duplicity is a software for encrypted remote backups that has an S3 backend. It can be used interactively or within a script that runs periodically via cron to backup your data. A basic example script follows. Duplicity is available on linux systems via the package management system and has a explicative man page.
export AWS_ACCESS_KEY_ID=<your_access_key> export AWS_SECRET_ACCESS_KEY=<your_secret_key> # Do not forget this passphrase!!! It is used to encrypt your data and is needed for restoring data! export PASSPHRASE="trentatre gatti neri" duplicity --full-if-older-than 3M Documents s3://s3.lngs.infn.it/backup
The same data that you can access via API or via a command line client can be also browsed via a sync&share service, like LNGS https://gsbox.lngs.infn.it. You can mount your S3 buckets on owncloud and access them via its web interface. Synchronizing such data via the owncloud sync&share client should not be done as it will cause high load and data replication errors
Here is how you can mount an S3 bucket on owncloud:
Make sure you enable both the “Enable SSL” and the “Enable Path Style” buttons and indicate “us-east-1” as region.