AWS S3

by Gisle Hannemyr

This chapter ...

Table of contents

Introduction

Amazon Web Services (AWS) provides two storage services:

Only AWS S3 is discussed in this chapter.

There exists modules to integrate Drupal and AWS S3 (see, for example, the projects named S3 File System and AmazonS3). Neither are well maintained. For backup purposes, I've created my own scripts to run from the Gnu/Libux CLI. These should be documented below [todo].

Permissions

By default, all Amazon S3 resources – buckets, objects, and related subresources (for example, lifecycle configuration and website configuration) – are private: only the resource owner, an AWS account that created it, can access the resource. The resource owner can optionally grant access permissions to others by writing an access policy.

Tools

I am currently aware of the following tools to interact with AWS S3:

noteThe CLI s3cmd sometimes produces this warning: "WARNING: Empty object name on S3 found, ignoring.", while the CLI aws command does not. My scripts therefore uses the latter.

AWS mangement console

The AWS Management Console is a simple and intuitive web interface to all AWS services. It can be used to manage S3.

To log in, go to https://aws.amazon.com/. It uses two-phase authentication based upon Authy.

Tutorial: Getting Started with Amazon Simple Storage Service.

Create a bucket

New buckets should be created in the AWS management console. I have trouble setting location constraints using the CLI tools described below.

Tutorial: Creating a bucket.

Add an object to a bucket

View an object

Move an object

Delete an object and bucket

S3 browser

The S3 Browser is a free MS Windows client from NetSDK Sowtware LLC that provides a GUI to interact with AWS S3 and Amazon CloudFront.

See the chapter about setting up a new Windows PC for installation instructions.

AWS shell

The AWS Command Line Interface is a unified tool to manage your AWS services. With just one tool to download and configure, you can control multiple AWS services from the command line and automate them through scripts.

User guide: html, pdf.

See the user guide for instructions about downloading the latest. However, the version from the Ubunto repo is OK.

$ sudo apt update
…
All packages are up to date.
$ sudo apt install awscli
…
$ aws --version
aws-cli/1.18.69 Python/3.8.10 Linux/5.4.0-91-generic botocore/1.16.19

The configuration (config) and credentials (credentials) live below the directory ~/.aws.

List all buckets:

$ awd s3 ls
2017-02-10 06:44 first-bucket
2018-12-09 10:12 another-bucket
…

Administrate buckets

To create a new bucket:

$ aws s3 mb s3://example.org

noteThis does not set a region contraint. Use the web (AWS management console) or the GUI (S3 browser) instead.

To inspect the physical location of a particular bucket:

$ aws s3api get-bucket-location --bucket example.org
{
    "LocationConstraint": "eu-west-1"
}

I use these locations:

Sync

The s3 sync command synchronizes the contents of a a directory on your server and a bucket. Typically, it copies missing or outdated files or objects between the source and target. I.e.: It updates files that have a different size or modified timestamp that exist with the same name at the destination. It will recursively synchronize subdirectories.

The output displays specific operations performed during the sync.

To upload a specific directory:

$ aws s3 sync directory s3://example.org/directory/
upload: myfile1.txt to s3://example.org/directory/myfile1.txt
upload: subdir/myfile2.txt to s3://example.org/directory/subdir/myfile2.txt

You can also supply the --delete option to remove files or objects from the target that are not present in the source.

To dowload an entire bucket do the current directory, do:

$ aws s3 sync --quiet s3://example.org .

The "quiet" option supresses output about each downloaded file.

To dowload an single do the current directory, do:

$ aws s3 cp s3://example.org/directory/file.txt .

S3cmd

S3cmd is free software for interacting with AWS S3 storage. It was created by TGRMN software and others. It can be used for batch scripts and automated backup to S3, triggered from cron, etc. It is written in Python.

Some links: Home page; GitHub; README.md; usage; sync; tutorial.

For most platforms, it is available from standard distribution repos, these versions tend to be lagging, for instance, for Ubuntu 20.04 LTS, it is 2.0.2, while the most recent release (2021-12-10) is 2.2.0

$ sudo apt-cache policy s3cmd
…
  Candidate: 2.0.2-1ubuntu1

However. the distributuion version works OK. To install:

$ sudo apt install s3cmd

To install the most recent version, use pip (a CLI tool for installing and managing Python packages). You may need to install pip first:

$ sudo apt install python-pip

Install s3cmd:

$ sudo pip install s3cmd
$ s3cmd --version
2.2.0

To upgrade:

$ sudo pip install --upgrade s3cmd
$ s3cmd --version
2.2.0

To configure s3cmd to work with your AWS account;

$ s3cmd --configure

This initial configuration is necessary as there has to be an associated user with every bucket. Rerunning the command after it has been configured shows every setting and lets you change it. In the end you can refuse to save changes.

The configuration lives in text file ~/.s3cfg.

A basic configuration file is created automatically when you first issue the s3cmd --configure command after installation. You will be asked a few questions about your Amazon access key and secret key and other settings you wish to use, and then s3cmd will save that information in a new config file. Other advanced settings can be changed (if needed) by editing the config file manually.

If you have a working configuration, and need to set up s3cmd on a new server, you can just copy the configuration file.

List all buckets:

$ s3cmd ls
2017-02-10 06:44 s3://first-bucket
2018-12-09 10:12 s3://another-bucket
…

The datestamp is not last update of bucket (probably creation date for bucket).

List all directories inside a specific bucket:

$ s3cmd ls s3://first-bucket
            DIR   s3://first-bucket/Firstdir/
            DIR   s3://first-bucket/Anotherdir/

List all files inside a bucket using --recursive to scan sub-directories recursively:

$ s3cmd ls --recursive s3://first-bucket/Firstdir/
2017-02-19 11:02  0        s3://first-bucket/Firstdir/Subdir/
2017-02-19 11:02  838671   s3://first-bucket/Firstdir/Subdir/myfile.pdf
…

Download a specific file from an Amazon S3 bucket:

$ s3cmd get s3://first-bucket/Firstdir/Subdir/myfile.pdf

Make a new bucket:

$ s3cmd --region="eu-cental-1" mb s3://example.org

Remove a bucket:

  $ s3cmd rb s3://example.org

The put and get commands do unconditional transfers. All matching files copied. This is similar to a standard GNU/Linux cp CLI command that also copies whatever it is told to copy. Using --recursive will transfer directories recursively. Example:

$ s3cmd put --recursive dir1 dir2 s3://example.org/directory/

Using sync do conditional transfers. Only files that do not exist at the destination in the same version are copied. A md5 checksum and file size is compared to determine version. This is similar to the standard GNU/Linux rsync CLI command. Using --skip-existing will only compare names, not the md5 checksum and file size.

$ s3cmd sync --skip-existing directory s3://example.org/directory/

The --dry-run option can be used to check what a command will do.

Encrypted storage

The s3cmd client supporsts client side encryption using gpg. To enable it, have gpg available on your system and have the following in ~/.s3cfg:

encrypt = True
gpg_passphrase = somethingverydifficulttoguess

Final word

[TBA]


Last update: 2021-07-05 [gh].