Rotating AWS Secrets Manager Secrets with Lambda

Again, the only photo with me for now that's closest to the theme, a spider weaving a trap near Manigala, Sri Lanka

Secrets Manager secret types

AWS Secret Manager supports a several types of secrets. These are mainly to support effective secret management by integrating them closer with the AWS Services they support. These are mainly various types of database services offered by AWS, which is one of the key uses for using Secrets Manager to manage master credentials.

AWS Secrets Manager Secret Types

One of the key advantages of using first class secret types for the scenarios they support is, built-in automatic secret rotation functionality. With secret rotation, you can implement security best practices in credentials management with minimal effort.

However for custom secret type secrets, secret rotation takes a bit more of an effort to implement. This is purely because the particular secret could be associated with practically any type of service or process and needs specific steps to be followed to implement the rotation process for it. AWS makes it really easy to implement custom secret rotation logic with close support from AWS Lambda as a compute layer. In the following sections, let’s dive into an example on how to do this.

Secret Versions and Stages

A secret in real-world is an evolving item. Even if the value wouldn’t change overtime, at a given moment, the value of the secret is only known after retrieval. With security standards and controls containing frequent credential rotation as important to implement, the party that uses the secret will have to depend on the secret management tool to provide the proper value for the secret at the moment.

AWS Secrets Manager handles this aspect of an evolving secret by using staging labels. For a given Secret in AWS, there could be three staging labels corresponding to three versions of the secret value at a given moment. These are,

AWSCURRENT - This is the version that is used whenever a secret is read. This is also the version that gets replaced during a secret rotation.
AWSPREVIOUS - This is the last value of the secret before the value marked as AWSCURRENT. AWS does not keep multiple older versions of the same secret, instead it maintains the last known value. To access this version, you can use the CLI or the API and specify AWSPREVIOUS as the version ID.
AWSPENDING - This label is used to mark an incoming new value for the secret. Usually, this version label only exists for the duration of the secret rotation process.

At the start of a rotation process the new value is made AWSPENDING, and after finalising the rotation, the new value is made AWSCURRENT while the previous value is transitioned to AWSPREVIOUS.

Let’s consider a custom secret where the secret is derived from a custom logic and should be rotated in a third party system. The steps to implement proper rotation for this is to,

register the secret in Secrets Manager
write a Lambda function for rotation steps
enable rotation with a schedule for the secret and point it towards the Lambda function we wrote

Registering the Secret

Adding a secret in Secrets Manager is pretty straightforward. While doing this, make a note to not enable Automatic Rotation, since we don’t have the Lambda function ready yet. If you use a AWS KMS Customer Managed Key as the encryption key for the secret, make sure the Key Policy allows access from the Lambda function as well.

In this case we have set the initial secret value to a random kn6aTD5*$$*n7v. However it could be (for example) the credentials you generated for a custom database.

Sample secret db/credentials Sampel secret db/credentials initialvalue

Writing a Lambda Function for Secret Rotation

For this post, let’s write the rotation Lambda implementation in Python. The concepts are the same for any language targetting the runtimes supported by Lambda.

All code is available on the Github repository.

Execution Role Permissions

The Lambda function needs permissions to read and modify the secret when invoked. This is provided to the Lambda function as an Execution Role.

When creating the function, define a new Execution Role with the following permissions.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "Allow managing the secret from Lambda",
      "Effect": "Allow",
      "Action": [
        "secretsmanager:GetSecretValue",
        "secretsmanager:DescribeSecret",
        "secretsmanager:PutSecretValue",
        "secretsmanager:UpdateSecretVersionStage",
        "secretsmanager:ListSecretVersionIds"
      ],
      "Resource": "<secret-arn>"
    }
  ]
}

Make sure to attach the AWSLambdaBasicExecutionRole policy to the Execution Role as well. This allows the Lambda function to write to CloudWatch Logs. Without this, you will not be able to properly troubleshoot an execution.

Invoking the Function from Secrets Manager

Similar to giving permissions for the Lambda function to modify the secret, a Resource Based Policy should be added to the Lambda function so that Secrets Manager is able to invoke the Lambda function.

To do this, once the Lambda function is created, go to Configuration -> Permissions -> Resource-based Policy Statements and add a new policy. Specify AWS Secrets Manager as the service to be granted lambda:invokeFunction permission.

Adding a new resource-based policy

Function Implementation

During a secret rotation process, a series of “steps” are followed in order, that expects the different kinds of work that should be done for rotation to be performed during each step. When automatic rotation with a Lambda function is enabled, AWS Secrets Manager calls the function at least four times for the same secret. During each call, a different “step” is expected to be executed by the Lambda function. These steps are,

createSecret - generate the new secret value and store it as AWSPENDING
setSecret - set the newly generated AWSPENDING value in the service
testSecret - verify that the service credentials were properly changed by using the AWSPENDING value to access the service
finishSecret - mark the new value as AWSCURRENT and move the old value to AWSPREVIOUS

As mentioned above, each invocation of the Lambda function during the rotation process receives a payload similar to the following.

{
  "Step": "createSecret",
  "SecretId": "<secretARN>",
  "ClientRequestToken": "<rotation unique token to be used as version ID>"
}

When writing the Lambda function, we can check the Step value in each invocation and implement the different steps as needed. When persisting the new value, the ClientRequestToken should be used as the version ID.

The handler function for the Lambda function is handle_event(). We’ll be checking the step that invoked the function execution in this function.

if event["Step"] == "createSecret":
    # first step of the process, we are generating a new value for the next
    # version of the secret here.

    # ...

    return None

elif event["Step"] == "setSecret":
    # in the next step, we apply the new version of the secret to the
    # remote server. There can be situations where a secret is not
    # necessarily about a service credential. In those cases, this step can
    # be skipped.

    # ...

    return None

elif event["Step"] == "testSecret":
    # in this step, the remote server change is tested to be successful.
    # Like the previous step, if the secret is not a service credential or
    # has nothing to do with an external service, this step can be skipped.

    # ...

    return None

elif event["Step"] == "finishSecret":
    # final step of the rotation process. We are transitioning the new
    # secret version to be the actual "current" version. The previous
    # version is preserved, however default reads point to the new version
    # only.

    # ...

    return None

Each execution retuning None is a successful execution for Python3.9 runtime for Lambda. Let’s dive into the sample implementation steps next.

`createSecret`

This is the first invocation of the Lambda function in the rotation process. We should generate what should be the next version of the secret in this step. For our sample scenario, we generate a random password.

# generate a new password
new_pwd = generate_password(16)

# persist as a new secret version setting version stage to AWSPENDING,
# the new password isn't usable yet
secret_client.put_secret_value(
    ClientRequestToken=event["ClientRequestToken"],
    SecretId=event["SecretId"],
    SecretString=new_pwd,
    VersionStages=["AWSPENDING"],
)

When persisting the new version, we label it as AWSPENDING. Ideally, a given secret would have a version labelled AWSPENDING only during the brief rotation period only. If all goes well, this version should transition into AWSCURRENT. The VersionId of the new version is the ClientRequestToken that was used to invoke this step. If you’re seeing errors related to different ClientRequestToken being used to persist or transition secret versions, it probably means a version with AWSPENDING staging label exists for the secret from a previous rotation attempt. In those cases, investigate the scret with AWS CLI and remove the unwanted version as needed.

You might need to run cancel-rotate-secret on the secret if the rotation is being retried. Typically, a failed rotation is retried and given up after 5 attempts. After fixing the versions, go back and enable Secret Rotation again.

# list secret versions
aws secretsmanager list-secret-version-ids \
  --secret-id <secret-arn> \
  --no-cli-pager

AWSPENDING staginglabel

# remove unwanted AWSPENDING version (more accurately, remove AWSPENDING label
# from that version
aws secretsmanager update-secret-version-stage \
  --secret-id <secret-arn> \
  --no-cli-pager \
  --remove-from-version-id <version-id-with-awspending-label> \
  --version-stage AWSPENDING

AWSPENDING staginglabel removed

You can alternatively write this into the rotation logic instead. For any failures after writing the AWSPENDING version, you can execute an update_secret_version_stage() call.

secret_client.update_secret_version_stage(
    SecretId=event["SecretId"],
    VersionStage="AWSPENDING",
    RemoveFromVersionId=event["ClientRequestToken"],
)

`setSecret`

The second step of the process is intended to be used to apply credentials and other types of secrets that are to do with external services. For an example, for a database credential, this could be applying the new value (with the AWSPENDING label) as the user password.

secret_version = secret_client.get_secret_value(
    SecretId=event["SecretId"],
    VersionId=event["ClientRequestToken"],
)

new_value = secret_version["SecretString"]
print(f"changing password in the remote server with value {new_value}")

For cases where the secret has nothing to with an external service directly or if the external service does not support an API to rotate credentials or tokens with, this step and the next testSecret step can be skipped.

`testSecret`

The next step for secrets that are directly related with external services that implements the previous setSecret step is to verify the new version of the secret is usable from this point onwards. This can be as simple as executing a simple read operation on the remote server and making sure the operation completes without an issue.

Same as before, this step can be skipped if not relevant to the secret context.

`finishSecret`

After making sure the new version of the secret is the actual one usable (where relevant), we can transition it to the AWSCURRENT label. When we perform this step AWS does two things.

the current version with the AWSPENDING staging label is made AWSCURRENT
the previous AWSCURRENT version is made AWSPREVIOUS. If there was another AWSPREVIOUS version available, that is scheduled to be deleted in the background.

To make the transition we need two version IDs, of the version with the AWSPENDING label and the version with the AWSCURRENT label.

# find the version ID to which AWSCURRENT is attached to now
versions = secret_client.list_secret_version_ids(
    SecretId=event["SecretId"])

prev_version_id: str = ""
for version in versions["Versions"]:
    for stage in version["VersionStages"]:
        if stage == "AWSCURRENT":
            prev_version_id = version["VersionId"]

if prev_version_id == "":
    raise RuntimeError("could not find the previous version ID")

# set the new value to AWSCURRENT
secret_client.update_secret_version_stage(
    SecretId=event["SecretId"],
    VersionStage="AWSCURRENT",
    MoveToVersionId=event["ClientRequestToken"],
    RemoveFromVersionId=prev_version_id,
)

After executing this step, we are done! The secret has been successfully rotated, and the new value will be returned for all the read operations from this point onwards.

Rotated secret value

Registering Rotation

Once the Lambda function is at a testable level, you can enable Automatic Secret Rotation. A rotation schedule and the rotation implementation Lambda function is specified when enabling this.

Enabling Automatic SecretRotation

The secret can be rotated immediately as well, which is a great way to iteratively develop the logic without having to wait until the schedule triggers rotation.

Conclusion

As always, a few things can go wrong during development of the Lambda function for your specific use case.

In most cases, the secret will end up a state with a version ID different to the current ClientRequestToken having AWSPENDING staging label. Following the instructions detailed above in the createSecret section, the older version ID can be removed. However, take into account the step at which the rotation process failed. If the secret is managing access to a remote service and rotation process failed after a successful setSecret step execution, you may need to manually revert the credential changes done in the remote server before deleting the older AWSPENDING version, to avoid losing access to the service.

All code mentioned in this post is available on the Github repository.