Mount Kaukau

Continuing from the previous article on introduction to AWS Backup

Offsite Backups

Let’s talk theory for a bit.

Data protection in a given solution depends on how thoughtful the backup strategy is. While a simple Backup Plan in AWS Backup that periodically takes a backup of the database into the Account-local Vault could technically be considered as a backup strategy, it is not a good one. For an effective and resilient data protection policy, backups should exist in locations other than the solution deployment, with access to them separated by network and access levels.

Offsite backups make sure that in an event that makes the primary site unavailable, a secondary site is ready to provide snapshots of the data to kick off another deployment. This is essential to implement the Recovery Time (RTO) and Recovery Point Objectives (RPO) of a solution, with properly scheduled and available backups being the key ingredient in implementing for RPO and RTO.

Additionally, offsite backups provide a barrier in a security event that might potentially take out on-site backups. An actor that has compromised the primary site will easily be able to control the primary backups as well. An offsite vault that has different access requirements and is not directly connected to the primary site through networking will be significantly harder to compromise.

Furthermore, based on the critical nature of the business data, potential for disruption, and the impact of disruption, government compliance directives could force solutions to include offsite backups in the Disaster Recovery and Business Continuity Plans. For example, New Zealand Information Security Manual (NZISM) contains processes for offsite backups as part of the 6.4.6 Backup Strategy section, which is further elaborated on by the AWS Implementation Guide for NZISM. Systems, where possible should always implement offsite backups for an effective DR strategy.

Traditional wisdom on backing up data has been on the lines of 3-2-1 where three copies of the production data needs to be maintained in two different mediums with one being offsite. This is a minimal standard that can be added on to with each solution’s requirements and as platform technologies progress on backup and storage features.

Offsite Backups with AWS Cross-Account Vaults

In AWS, implementing offsite backups is made easy with Accounts, IAM permissions, and AWS Backup working together. A different account, removed from the primary deployment in both network level and IAM access level, can be used as the DR site. It can (and where possible should) be using a different AWS region selected with data sovereignty requirements of the solution in mind.

AWS Backup Plans provide a simple method of copying each recovery point across to different Vaults. By using a Vault in a separate Account as the target for the copy operation, offsite backups can be easily implemented.

Using a different Account brings a few features and advantages to the table.

Separation of Responsibilities

While a Backup Vault can be created in a different region in the same Account, managing this will not be easy for several reasons. Data persistence will be spread out in different Accounts in a given deployment. Managing multiple Vaults in different regions in each Account and making sure they don’t drift from the intended backup strategy will not be an easy task. Infrastructure-As-Code is as good as the person who codes the said infrastructure. Even with code-reviews in place, it does not take more than a single wrong commit to break the backup process without anyone easily noticing. Managing offsite backups in a separate Account brings this together to a single pane of glass view.

Additionally, giving access for auditors, operations teams, and management users to view, verify, and manage backup implementation can be messy with the same account offiste backup Vault approach. Each Account’s Role for these user types have to be meticulously defined and maintained. Managing access to a single Account which is guaranteed to be removed from the active compute and data stores is far easier and brings down the complexity by several degrees.

Security Barrier

Vault Policies can be used to define which Accounts can use a given Vault as the offsite, and which Accounts can be used to restore the snapshots. This is an effective way to separate users who need to perform backups and restore but not necessarily need access to the actual data while doing so. Additionally, IAM permissions and Roles can separate security auditors from the operations users, and provide granular access to reporting without having to give access to data.

Furthermore, SCPs can be leveraged to make sure the Backup Vaults are not deleted by accident or on purpose.

Cloud-native 3-2-1

Going back to the 3-2-1 approach in traditional data protection strategies, AWS Backup helps easily implement a modified [2..*]-1-[1..*] approach for AWS services. That is, two or more copies of the production data can be available on a single medium with one or more of them being offsite. Implementing this strategy comes with minimal cost with usage based costs being applied as the other AWS services. At the moment, AWS Backup does not offer maintaining available recovery points in multiple mediums. Recovery Point retention and lifecycle policies allow transitioning older recovery points to low cost cold storage, however that would not count as two different mediums with available backups.

While cross-account backups can be part of the Disaster Recovery strategy for an AWS deployment, unless Backup/Restore is decided as the DR strategy with the hours or days of recovery time and recovery points that come with it, taking cross-account backups needs to be supplemented with a fallback deployment for it to be considered a Disaster Recovery strategy. This can be a pilot light deployment in the secondary region, refreshed with each recovery point, or a passive (yet ready to be hydrated) database restored from the latest recovery point.

Encryption in Backups

As mentioned above, AWS Backup Vaults are encrypted with AWS KMS keys. The AWS managed aws/backup key will be useful for simple use cases, however for some AWS Services to be properly backed up, the target Vaults must to be encrypted with CMKs that need to have their Key Policies modified to enable encrypt and decrypt by Service-linked Role for Backup that performs the snapshot and copy operations. Furthermore, Cross-Account backup operations need the destination Account Vault to be encrypted with a CMK.

The reason for this is clear once how different services support AWS Backup is understood. AWS Services can be categorised into two groups on their level of support for AWS Backup encryption, as those that support independent encryption by AWS Backup, and those that do not.

AWS Backup Independant Encryption

This is when AWS Backup as a service itself handles encryption of the recovery points and the cross-account copies of the recovery points. Backup and the resource association roles take care of encryption so that manual work to allow key access is minimal.

DynamoDB is a service that supports independant encryption (when DynamoDB Advanced Backup is enabled, which is the case for newer Accounts). When a service supports independent AWS Backup encryption, the Backup operation for the specific resource itself uses the Backup Vault key to encrypt the recovery point. This would be the case even for resources that do not encrypt their content (highly discouraged, but a possible scenario). Therefore, for these services, the source Vault (the Vault that performs the first Backup operation), the destination Vault (the Vault that the recovery point is copied across to), and the resource itself can be encrypted with AWS managed KMS keys. AWS Backup handles the backup and copy operation internally and the Key policies do not have to be modified.

Resource Based Backup Encryption

For services like RDS that do not support independent encryption, AWS Backup uses the resource specific KMS key to encrypt the recovery point in the first place. This happens when the backup is being taken, and the key is used by the Backup job role (the role we used when creating Resource Associations above) that performs the operation. The CMK associated with the resource should modify its key policy so that the Backup Job role is able to perform Decrypt (to decrypt the resource data) and Encrypt (to encrypt the recovery point) operations. Additionally, for the cross-account copy operation, the destination account has to be given access to the key as well, since it is the destination account’s Backup Service-Linked role that performs the copy operation.

The following shows the Key Policy statements for a CMK used to encrypt a DynamoDB table in the source account.

yyyyyyyyyyyy - source account number, xxxxxxxxxxxx - destination account number

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "Allow source account Backup to decrypt data to be encrypted with Vault key",
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::yyyyyyyyyyyy:role/dynamodb-backup-role"
      },
      "Action": [
        "kms:GenerateDataKey",
        "kms:DescribeKey",
        "kms:Decrypt",
        "kms:Encrypt",
        "kms:CreateGrant"
      ],
      "Resource": "<resource_arns>"
    },
    {
      "Sid": "Allow destination Backup to decrypt for Backup operations",
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::xxxxxxxxxxxx:role/aws-service-role/backup.amazonaws.com/AWSServiceRoleForBackup"
      },
      "Action": [
        "kms:GenerateDataKey",
        "kms:DescribeKey",
        "kms:Decrypt",
        "kms:CreateGrant"
      ],
      "Resource": "<resource_arns>"
    }
  ]
}

KMS Multi-region keys could also be useful here, however Key Policies have to be setup according to the above model.

Restore Testing

No backup strategy is complete without restore testing. Talk to any engineer worth their salt and they will tell you horror stories about how their meticulous on schedule backups failed miserably when they actually needed do a restore because the backup format was wrong, backup medium was bad, or the vendor pushed recovery data format issues to the future. These are so common that for ISO27001 itself checks for evidence for periodic restore testing. With traditional vendors and self hosted solutions, restore testing could be a limited recovery of a subset of files/objects with an email confirmation from the vendor, which in a real situation could sometimes fail fantastically. Adding to that is the manual work overhead and room for error in performing the tests.

In AWS Backup, restoring from a recovery point creates a new resource, to avoid overwriting existing data. This can be easily adapted into an automated restore testing plan which provides confidence in the backup strategy and auditable evidence of properly done restore testing at the same time.

Performing a restore action and testing the validity and integrity of the restored data can be automated as a periodic (or triggered on each backup job completion) Lambda function. Resulting CloudWatch logs can be retained for the period of time reviewers and auditors might be interested in. CloudWatch Alarms that go off when restore or validation jobs fail can be configured to trigger p2/p3 incidents with high severity (since failure of a backup can be critical for business goals).

Conclusion

As a centralised service for solution wide backup strategy implementation, AWS Backup provides various constructs to build an automated backup and restore plan with minimal human intervention needed in the process. Encryption of the backups is a first-class concept with least-privilege permissions being leveraged to make sure data is protected at all layers. Disaster Recovery can be extended on the backup strategy by using automated cross-account backups that can use different regions to align Business Continuity goals to the technical implementation. APIs provided by Backup allow for low-intervention restore testing to be added into the mix for a complete backup/restore strategy.