AWS S3
This chapter ...
Table of contents
Introduction
Amazon Web Services (AWS) provides two storage services:
- AWS Simple Storage Service (AWS S3) provides cloud storage. You can use AWS S3 to store and retrieve any amount of data at any time, from anywhere on the web.
- Amazon CloudFront is a content delivery network (CDN). It can be used to deliver files using a global network of edge locations.
Only AWS S3 is discussed in this chapter.
There exists modules to integrate Drupal and AWS S3 (see, for example, the projects named S3 File System and AmazonS3). Neither are well maintained. For backup purposes, I've created my own scripts to run from the Gnu/Libux CLI. These should be documented below [todo].
Permissions
By default, all Amazon S3 resources – buckets, objects, and related subresources (for example, lifecycle configuration and website configuration) – are private: only the resource owner, an AWS account that created it, can access the resource. The resource owner can optionally grant access permissions to others by writing an access policy.
Tools
I am currently aware of the following tools to interact with AWS S3:
- AWS mangement console: Web console.
- S3 browser: GUI for MS Windows.
- AWS-shell: CLI for Gnu/Linux
- S3cmd: CLI for Gnu/Linux
The CLI s3cmd sometimes produces this warning: "WARNING: Empty object name on S3 found, ignoring.", while the CLI aws command does not. My scripts therefore uses the latter.
AWS mangement console
The AWS Management Console is a simple and intuitive web interface to all AWS services. It can be used to manage S3.
To log in, go to https://aws.amazon.com/. It uses two-phase authentication based upon Authy.
Tutorial: Getting Started with Amazon Simple Storage Service.
Create a bucket
New buckets should be created in the AWS management console. I have trouble setting location constraints using the CLI tools described below.
Tutorial: Creating a bucket.
Add an object to a bucket
View an object
Move an object
Delete an object and bucket
S3 browser
The S3 Browser is a free MS Windows client from NetSDK Sowtware LLC that provides a GUI to interact with AWS S3 and Amazon CloudFront.
See the chapter about setting up a new Windows PC for installation instructions.
AWS shell
The AWS Command Line Interface is a unified tool to manage your AWS services. With just one tool to download and configure, you can control multiple AWS services from the command line and automate them through scripts.
See the user guide for instructions about downloading the latest. However, the version from the Ubunto repo is OK.
$ sudo apt update … All packages are up to date. $ sudo apt install awscli … $ aws --version aws-cli/1.18.69 Python/3.8.10 Linux/5.4.0-91-generic botocore/1.16.19
The configuration (config
) and credentials
(credentials
) live below the
directory ~/.aws
.
List all buckets:
$ awd s3 ls 2017-02-10 06:44 first-bucket 2018-12-09 10:12 another-bucket …
Administrate buckets
To create a new bucket:
$ aws s3 mb s3://example.org
This does not set a region contraint. Use the web (AWS management console) or the GUI (S3 browser) instead.
To inspect the physical location of a particular bucket:
$ aws s3api get-bucket-location --bucket example.org { "LocationConstraint": "eu-west-1" }
I use these locations:
eu-eu-central-1
: Frankfurteu-west-1
: Dublin
Sync
The s3 sync
command synchronizes the contents of a a
directory on your server and a bucket. Typically, it copies missing or
outdated files or objects between the source and target. I.e.: It
updates files that have a different size or modified timestamp that
exist with the same name at the destination. It will recursively
synchronize subdirectories.
The output displays specific operations performed during the sync.
To upload a specific directory:
$ aws s3 sync directory s3://example.org/directory/ upload: myfile1.txt to s3://example.org/directory/myfile1.txt upload: subdir/myfile2.txt to s3://example.org/directory/subdir/myfile2.txt
You can also supply the --delete
option to remove
files or objects from the target that are not present in the
source.
To dowload an entire bucket do the current directory, do:
$ aws s3 sync --quiet s3://example.org .
The "quiet" option supresses output about each downloaded file.
To dowload an single do the current directory, do:
$ aws s3 cp s3://example.org/directory/file.txt .
S3cmd
S3cmd is free software for interacting with AWS S3 storage. It was created by TGRMN software and others. It can be used for batch scripts and automated backup to S3, triggered from cron, etc. It is written in Python.
Some links: Home page; GitHub; README.md; usage; sync; tutorial.
For most platforms, it is available from standard distribution repos, these versions tend to be lagging, for instance, for Ubuntu 20.04 LTS, it is 2.0.2, while the most recent release (2021-12-10) is 2.2.0
$ sudo apt-cache policy s3cmd … Candidate: 2.0.2-1ubuntu1
However. the distributuion version works OK. To install:
$ sudo apt install s3cmd
To install the most recent version, use pip (a CLI tool for installing and managing Python packages). You may need to install pip first:
$ sudo apt install python-pip
Install s3cmd:
$ sudo pip install s3cmd $ s3cmd --version 2.2.0
To upgrade:
$ sudo pip install --upgrade s3cmd $ s3cmd --version 2.2.0
To configure s3cmd to work with your AWS account;
$ s3cmd --configure
This initial configuration is necessary as there has to be an associated user with every bucket. Rerunning the command after it has been configured shows every setting and lets you change it. In the end you can refuse to save changes.
The configuration lives in text file ~/.s3cfg
.
A basic configuration file is created automatically when you first
issue the s3cmd --configure
command after
installation. You will be asked a few questions about your Amazon
access key and secret key and other settings you wish to use, and then
s3cmd will save that information in a new config file. Other advanced
settings can be changed (if needed) by editing the config file
manually.
If you have a working configuration, and need to set up s3cmd on a new server, you can just copy the configuration file.
List all buckets:
$ s3cmd ls 2017-02-10 06:44 s3://first-bucket 2018-12-09 10:12 s3://another-bucket …
The datestamp is not last update of bucket (probably creation date for bucket).
List all directories inside a specific bucket:
$ s3cmd ls s3://first-bucket DIR s3://first-bucket/Firstdir/ DIR s3://first-bucket/Anotherdir/
List all files inside a bucket using --recursive
to scan sub-directories recursively:
$ s3cmd ls --recursive s3://first-bucket/Firstdir/ 2017-02-19 11:02 0 s3://first-bucket/Firstdir/Subdir/ 2017-02-19 11:02 838671 s3://first-bucket/Firstdir/Subdir/myfile.pdf …
Download a specific file from an Amazon S3 bucket:
$ s3cmd get s3://first-bucket/Firstdir/Subdir/myfile.pdf …
Make a new bucket:
$ s3cmd --region="eu-cental-1" mb s3://example.org
Remove a bucket:
$ s3cmd rb s3://example.org
The put
and get
commands
do unconditional transfers. All matching files copied. This
is similar to a standard GNU/Linux cp
CLI command that also
copies whatever it is told to copy. Using --recursive
will transfer directories recursively. Example:
$ s3cmd put --recursive dir1 dir2 s3://example.org/directory/
Using sync
do conditional transfers. Only
files that do not exist at the destination in the same version are
copied. A md5 checksum and file size is compared to determine
version. This is similar to the standard GNU/Linux rsync
CLI command. Using --skip-existing
will only compare
names, not the md5 checksum and file size.
$ s3cmd sync --skip-existing directory s3://example.org/directory/
The --dry-run
option can be used to check what a command will do.
Encrypted storage
The s3cmd client supporsts client side encryption using gpg. To enable it, have gpg available on your system and have the following in ~/.s3cfg
:
encrypt = True gpg_passphrase = somethingverydifficulttoguess
Final word
[TBA]
Last update: 2021-07-05 [gh].