HNM backup & migration routines

This document spells out the routines for backing up and migrating the websites managed by HNM AS.

Table of contents

Drupal project mentioned in this chapter: Backup and Migrate.

Introduction

This is the operational note about the backups routines and migration procedures (e.g. between staging and production) for all websites managed by HNM AS.

The spreadsheet websites.xslx is used to keep track of backups and configuration details. All references to "the spreadsheet" below refers to this. It is stored in Documents » Work » Hnm » Websites @HNM-PC.

These routines are based “3-2-1” backup practice, with Backup and Migrate used for scheduled and manual database local backups and s3cmd used to copy local database backups, public and private files to the offsite backup location. Most sites also have a staging site, that may have backup, but the staging is not always kept in sync.

All offsite backups are kept at AWS S3 “EU (Frankfurt)”. To browse this archive, use S3 browser (MS Windows GUI program).

To login to external web services, use one of links and credentials found in my Web services link section.

Automatic backup to offsite destinations is done by cron. The command to view the active crontab is crontab -l.

Internal documentation

In the spreadsheet the “HNM” tab lists all sites and their databases, except the databases belonging to SNP, Karde and staging sites. Those belonging to SNP are found under the “SNP” tab. Those belonging to Karde are found under the “Karde” tab. The “Staging” tab lists staging sites.

Columns:

To enable offsite backups, do the following:

  1. Create an AWS S3 bucket for the backup. Use the site identifier in the spreadsheet. For instance. for the site “norren.no”, name the bucket “norren.no”.
  2. Create a new folder: “database” to hold the database dumps. The copied files will inherit the names from the original subdirectories.
  3. Backup of påublic and private files need to be tuned for each site, see exampeles below.

Basic scripted instructions for copying the database dumps and files to AWS S3 may look like the example below. In addition to the database dumps, each subdirectory below the public and private file system that contains original files should be copied. If the files live in the root, exclude subdirectories.

# Copy the database dump,
/usr/local/bin/s3cmd sync --skip-existing \
  /var/private/example/backup_migrate/ s3://example.org/database/
# Copy the "public://" (root public file) directory, but no subdirectories.
/usr/local/bin/s3cmd sync --skip-existing --exclude="/*" \
  /www/example.org/html/sites/default/files/ s3://example.org/files/
# Copy the "public://attachemnets" directory if it exists
/usr/local/bin/s3cmd sync --skip-existing \
  /www/example.org/html/sites/default/files/attachments s3://example.org/
# Copy the "public://pictures" directory if it exists
/usr/local/bin/s3cmd sync --skip-existing \
  /www/example.org/html/sites/default/files/pictures/ s3://example.org/
# Copy the "public://thumnnails/images/" directory if it exists.
/usr/local/bin/s3cmd sync --skip-existing \
  /www/example.org/html/sites/default/files/thumbnails/image s3://example.org/
# Copy the "public://thumnnails/images/" directory if it exists.
/usr/local/bin/s3cmd sync --skip-existing \
  /www/example.org/html/sites/default/files/thumbnails/image s3://example.org/
# Copy the "private://files" directory if it exists
/usr/local/bin/s3cmd sync --skip-existing \
  /var/private/example/pfiles/ s3://example.org/pfiles/
Exclude folders. https://stackoverflow.com/q/21891045/1837734

Migration between production and staging sites

To copy the database, copy the SQL-dump file created Backup and Migrate, and roll it back in on the destination.

Single files can be copied over the Internet with scp. Examples:

$ scp gisle@copymarks.no:/home/gisle/z_tegn.txt .
$ scp myproject.tar.gz gisle@copymarks.no:/home/gisle/myproject.tar.gz

Directories can be recursively be copied over the Internet with rsync -r.

For example, provided the siteroot directories for vhosts are located in the /var/www directory, ececuting the following two commands on the destination will first change directory, and then recursively copy all the files that make up the site “example.com” to the destination.

$ cd /var/www
$ rsync -r user@example.com:/var/www/example.com .

Below are two more rsync examples to run from the destination. The first does the same as the two commands above. The second recursively copy all the files in the directory user foo's home directory at the the source to current directory, and uses archive mode, which copies symlinks as symlinks.

$ rsync -r user@example.com:/var/www/example.com /var/www
$ rsync -ra user@example.com:/home/username/foo .

Make sure that no firewall blocks access.

When copying public files, note that private files that lives outsite the siteroot must be copied seperately.

This command, when run from a staging server, will list directories and files on the production server that are missing from staging server, or are different. It will not list materials that are only on the staging server.

$ rsync -rvnc user@production.com:/var/www/vhost/html /var/www/vhost

Two shell script has been prepared for Titan. They need to be edited.

First, at the destination (staging site), copy the code and the public files from the production server:

  1. Edit rsync_cpfromprod.sh to define PROD, WRP, WRS and SITE.
  2. Run rsync_cpfromprod.sh.
  3. If a multisite, move the site's settings directory to default. This also moves the public files subdirecory.

Next, migrate the database and the private files:

  1. Create the private files directory in the same location as on the production site.
  2. Back up the database and transfer the gzipped database SQL dump to the staging site.
  3. Edit mysql_createdb.sh to create the databse for the staging site.
  4. Run rsync_createdb.sh.
  5. Use gunzip to unpack a copy of the SQL dump.
  6. Use mysql -u gisle -p mydatabase < mydatabase.mysql to import the database.

Finish:

  1. Edit settings.php to point to contain the correct database and database credentials.
  2. Run fixperms to fix permissions and ownerships.

You should now be able to visit the URL of the staging site and inspect the copy.

noteIf there is a WSOD and running any drush command produces: “Fatal error: Call to undefined function cache_get()”, there may be a syntax error in settings.php.

If the site is a multisite (e.g. one of the SNP sites), some image file references may be wrong, as they have paths containing the site specific path hardwired. You can edit the source to fix the path. The first two shows two different ways to refer to an image in the puvlic file directory on a multiste, and may be replaced by the last on the migrated staging site (not a multisite).

/vvnf/sites/se.vvnf/files/image.jpg  
/sites/org.vvnf/files/image.jpg
/sites/default/files/image.jpg
… or with symlinks. Example of how to fix this for vvnf:

First in the webroot, link the root to the subsite name:

$ ln -s . vvnf

Incidently, this make the site a subsite of itself.

Then, in the sites directory, symlink all aliases that are in use to default:

$ ln -s default no.vvnf
$ ln -s default se.vvnf
$ ln -s default org.NNF.vvnf

Migration of a production site to a new web server

The instructions below is what I believe is best practice for moving a production website to a new web server.

  1. Make sure that the destintion is not blocked by a firewall.
  2. Make a manual backup of the source database and copy it to the destination sink directory.
  3. On the source web server, replace Drupal with a static web page to tell people that the site is currently being migrated. There is a template in /var/www/0_parked/migrated.html on do20.
  4. You may now change the zone file to point to the new destination. It typically takes at least one hour from you do this, until the new configuraton replaces the old.
  5. Set up DNS for a staging Drupal website pointing to the new destination (DNS name may be “staging.” followed by production site domain name). We are going use “example.com” as the production site domain name in the examples below. The database name will be “example”.
  6. Rsync the code base and public files. Remember to also migrate private files if they exist. Do this on the destination. (See above for meaning of options.)
  7. $ rsync -r user@example.com:/var/www/example.com /var/www
    $ rsync -ra user@example.com:/var/private/example /var/private
    
  8. Copy the “settings.php” from the migrated website to the ~/configfiles directory and prefix the filename with the database name.
  9. $ cp /var/www/example.com/html/sites/default/settings.php \
      ~/configfiles/example.settings.php
    
  10. Create a static HTML staging site:
  11. $ cp 0_parked staging.example.com
    
  12. Configure apache2 for the staging site and enable it:
  13. $ sudo a2ensite example.com.conf
    
  14. On the destination, test and reload apache2 for the domain and check that static HTML works.
  15. $ sudo apache2ctl configtest
    $ sudo systemctl reload apache2
    
  16. Run drupal7_cleaninst.sh to create a clean Drupal installation on the staging site. You may use the database name of migrated site. You now have a clean install of Drupal7 on the staging site.
  17. noteIf the site loads without styling, check the PHP variable $base_url in settings.php.

  18. Log in as the super admin. Set up private file system on the staging site.
  19. Make a backup to manual backups directory to create the backup directory.
  20. On the staging, replace the newly installed Drupal site with the migrated codebase.
  21. $ mv example.com staging.example.com
    
  22. Roll back the content from the backup into the staging site.
  23. Set up apache2 for the production site on the destination server.
  24. Monitor the front page of the migrated site, to see when the DNS change takes effect, restore the code base.
  25. $ mv staging.example.com example.com
    
  26. After the new DNS becomes valid, if you use TLS, the site is going to look broken until you set up TLS on the detination anyway.
  27. noteIf the site was set up with TLS, visiting the URL will typically produce a warning that says: “Warning: Potential Security Risk Ahead”. It will remain in place until TLS is enabled on the destination.

  28. Delete any TLS certificates for the source site. If it is enabled, disable it first (before deleting the certificate) Otherwise, apache2 will become confused by the cerificate missing.
  29. $ sudo a2dissite example.com.conf
    $ sudo a2dissite example.com-le-ssl.conf
    $ sudo apache2ctl configtest
    $ sudo systemctl reload apache2
    
  30. Then delete the certificates on the source site:
  31. $ sudo certbot delete
    
  32. If the destination website should use TLS, set it up.

    Certbot will create a numbered list of all the domains enabled for your Apache web server. You may pick more than one domain, but all the domains you pick will share a single certificate with the same common name (CN). Yout typically will pick just the domainname and “www.” followed by the domain name. You will also be asked about whether to set up redirects.

    Below is an example, where we pick two websites (5 and 7) from the numbered list (not shown), and want redirects (option 2):

    sudo certbot --apache
    Saving debug log to /var/log/letsencrypt/letsencrypt.log
    Plugins selected: Authenticator apache, Installer apache
    
    Which names would you like to activate HTTPS for?
    - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
    …
    - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
    Select the appropriate numbers separated by commas and/or spaces, or leave input
    blank to select all options shown (Enter 'c' to cancel): 5,7
    …
    Please choose whether or not to redirect HTTP traffic to HTTPS, removing HTTP access.
    - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
    1: No redirect - Make no further changes to the web server configuration.
    2: Redirect - Make all requests redirect to secure HTTPS access. Choose this for
    new sites, or if you're confident your site works on HTTPS. You can undo this
    change by editing your web server's configuration.
    - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
    Select the appropriate number [1-2] then [enter] (press 'c' to cancel): 2
  33. Finally, fix file permissions:
  34. $ cd /var/www/example.com
    $ fixperms
    

Migration of a subsite

After migrating the default site of a multisite, a slightly different procedure is used to migrate all the subsites sharing the same codebase.

  1. Use Backup and Migarate to dump the database.
  2. Copy the database dump to the destination.
  3. Create an empty database with the right name using the HNM script.
  4. Gunzip the dump.
  5. Use mysql to populate the database.
$ sudo mysql database < dump.mysql
  1. Navigate to the site settings directory.
  2. Copy the settings file from either ~/configfiles or a previous subsite sharing the same database.
  3. Edit "settings.php": database, prefix, $drupal_hash_salt.
  4. Check site's status report. Fix all problems. To fix character sets, position yourself in the directory holding the "settings.php" of the site and use the following command:
$ drush8 utf8mb4-convert-databases --collation=utf8mb4_danish_ci
  1. Repeat steps 6 to 9 for all subsites sharing the same database.
  2. When all sites sharing the same database has been migrated, make a backup.

AWS S3 overview

Current archives:

Legacy archives:

https://serverfault.com/questions/517474/s3cmd-with-delete-removed

Destinations for my backups

BM means Backup and Migrate (Drupal module). “-a” is automatic and “-m” is manual.

An overview of my backup locations is also in the spreadsheet. The following locations exists and the following backup methods are used:

  1. s3-a: Files: Cron job adds new files to bucket daily. DB: Cron job adds dated SQL created by BM (with date as part of file name) daily. So far, no smart delete.
  2. bm-a: DB backup is to Scheduled Backups Directory on pvn.no, schedule is set up in BM on site (daily). Sometimes manually copied to Ifi (below ~/www_docs/staging2/Backups/Db).
  3. bm-m: DB backup is to Manual Backups Directory on pvn.no (manually). Sometimes manually copied to Ifi (below ~/www_docs/staging2/Backups/Db).
  4. ifi-a: File tree overview, and file assets (most below public://files) from various VIP sites on pvn.no are copied to Ifi daily using cron on pvn.no (below ~/www_docs/staging2/Backups/Files).
  5. ifi-m: File tree overview, and file assets (most below public://files) from various fairly stable sites on pvn.no are copied to Ifi manually (below ~/www_docs/staging2/Backups/Files).
  6. s3-low: File assets belonging to low priority sites have been stored in tarballs on the Amazon cloud S3, in the bucket hnm-backup. Total DB dump are in in bar-20170210.sql. No scheduled backup, last backup was 2017-02-10.
  7. tree: File tree overview only.
  8. cdrom: everything is on a CD-ROM that is kept in my office.

Types of sites

The following lists the sites on pvn.no and classify them into types that use different backup locations and routines.

There is no backup of PHP and JS unless explictliy mentioned. For Drupal on pvn.no, the code base can recreated be consulting tree.txt (stored on Ifi as part of the daily backup) to see what files make up the code base, and then reconstructed by downloading fresh copies of contributed modules from Drupal.org, extracting custom modules from my own repo and libraries from elsewhere.

Contract sites

  1. personvernnemnda.no
  2. intranet.pvn.no

DB: BM is set up to create an on-site timestamped SQL dump every 1 day:

  1. intranet.pvn.no →
    ~/pvn2/backup_migrate/scheduled/PVNintranet-date.mysql.gz
    The destination is the within the private file system for the intranet.
  2. personvernnemnda.no →
    ~/db_backups/backup_migrate/scheduled/Personvernnemnda-date.mysql.gz
    The destination is the default SQL backup directory (private file system for everything else).

These timestamped mysql dumps are synced to AWS S3 at 20:02 every day (cron) by the script backup_pvnfiles.sh. The destination bucket is personvernnemnda.no.

These things are synced in this bucket:

  1. extranet_files: New mangaged image files in the public file system. Only the original image is backed up.
  2. intranet_files: New intranet files in the private file system. This includes the SQL dump in the private file system. The path is intranet_files/pvn2/. The top folder is for attachment files from the private file system. There are two subfolders: backup_migrate with daily SQL dumps and pictures with pictures from the public file system.

The public website (personvernnemnda.no) XXXX

  1. All new files in the default SQL backup directory (this includes the extranet DB backup).

Check that this works!

VIP Drupal sites

  1. casamargarita.no
  2. hannemyr.no (check if backed up).
  3. larsvik.no
  4. minner.vedel.no
  5. norren.no
  6. terjerasmussen.no

Check that this works!

Daily scheduled on-site BU of DB by means of BM. These are synced to AWS S3 at 20:02 every day (cron) by the script backup_pvnfiles.sh because all new files in the default SQL backup directory is synced.

File content (media assets) are automatically copied to Ifi as tarballs daily.

TODO: Backup files to AWS S3 using conditional backup. Autodelete old DB backups on AWS S3.

Legacy sites:

  1. www.dpanswers.com
  2. RoomAXS
  3. tolfa.no

These are temporarely running on DO vhosts because they require PHP 5 (others run PHP 7).

Manual file tree backup (htdocs and below) dated 2017-07-10 to ifi-m (all 3).
Manual DB backup dated 2017-07-10 to ifi-m (dbanswers).
Manual DB backup dated 2017-07-10 to ifi-m (roomsaxs & tolfa).

TODO: Auto backup of DB. Auto backup of images/upload (tolfa). Longer term: Convert to Drupal.

Low priority (LP) sites with own DB

On pvn.no:

  1. mjovik.com (stale)

On pvn.no:

  1. bolig-sameie.no (static demo)
  2. cc-arkiv.ngoweb.no (frozen archive).
  3. elvegaarden.net (stale)
  4. hannemyr.com (stale)
  5. i18n.no (placeholder, no real content)
  6. pet.roztr.org (fairly stale)
  7. predictive-policing (fairly stale)
  8. nemo-project.org (stale)
  9. roztr.com (notebook for me only)

I only do manual BU f of these.

Sites on pvn.no with only s3-low backup:

  1. drupalprimer.com (low priority project)

A DB backup from 2017-02-10 exists on Amazon S3. These do not have file content.

HTML sites on pvn.no:

  1. kristennygaard.org cdrom
  2. kristianvedel.dk ifi-m: 2017-07-10
  3. vedel.no ifi-m: 2017-07-10

Sites on pvn.no with no backup:

  1. copymarks.no (D8 test site, no real content, broken)
  2. copymarks.org (D7 test site, no real content)

Parked sites on pvn.no:

  1. www.digitalmediajournal.org
  2. mjovikbernern.no
  3. ngoweb.no
  4. pocketsim.no

Drupal code base backup

Drupal file assets backup

Drupal database backup

Tools

Tools to interact with Amazon AWS S3

Main: AWS mangement console.

MS Windows: S3 browser. Credentials are in clipperz.

CLI: aws and s3cmd.

See Amazon AWS S3 for instructions about usage.

Buckets kept at AWS S3:

Drupal Backup and Migrate

The Drupal Backup and Migrate (BM) module should be enabled for all contract and VIP Drupal sites at pvn.no, do18, do19 and do20 with a minimal profile and path to private file system set to: /var/private/identifier (most sites) or bar:/home/gisle/PVN2 (PVN).

The configuration for BM, including the name of the backup file is kept in the database. This means that the configuration is overwritten when a staging site is updated with the latest production snapshot, or a staging database is transferred to production. For now, only remedy is to correct manually.

Fix: Provide for a suffix (prod/staging) to be set in settings.php.

Final word

[TBA]


Last update: 2021-01-31 [gh].