Deriving AWS Account ID from Access Key

AWS Account IDs are interesting. They are not directly considered as sensitive, but on the other hand everyone seems to consider them as sensitive.

It’s probably for good reason. You shouldn’t give out anything that isn’t needed to get work done, even if those data are not “sensitive” in particular. Your name isn’t sensitive in particular, but you don’t usually walk around with your name on your tshirt either.

AWS has been clear on this in my opinion, in both actions and documentation. You see them blocking the account ID in their demos, and the documentation says to use and share them “carefully”.

AWS documentation on account ID

And the thing is, the account ID will get semi-public at one point anyways. You have to provide it to third party services that assume roles inside your AWS organisation. Encoded data like pre-signed URLs contain it.

And with these somewhat latest findings, we see the account ID is encoded in AWS access key IDs, in both permanent or temporary credentials. So if you have been blocking out the account ID in the screenshots but kept the temporary credentials because they were temporary like I did, then there you go. All of that for nothing.

Before I go in to details, this work was done by several researchers.

I’m merely explaining their work.

Details

So let’s first take a sample access key Id, AKIA6RVFFB77RE7OLMKI.

A key ID is a 20 character string of characters. We can break this down to two major parts. The first 4 characters are a well known mapping to the type of identifiers

first 4 char mapping

The next 16 characters are where the researchers were able to find the inteteresting details. The string is base32 encoded, 6RVFFB77RE7OLMKI.

Primer on Base32 Encoding

Base32 is an ecoding method that uses 32 characters to represent binary data.

The character set is case insensitive A-Z and the numbers from 2-7. 0 and 1 are skipped because they look similar to O and I.

A = 0000
B = 0001
…
Z = 11001
2 = 11010
…
7 = 11111

Base32 encoded strings are meant to be easy to transcribe by hand. They are also in a single case, so there’s no ambiguity between upper and lower case.

In brief what happens during base32 encoding is this. The input data in binary is split into blocks of 5 bytes.

111101000110101001010010100001111111111110001001

11110100 01101010 01010010 10000111 11111111 10001001

Then each block is divided in to 8 groups of 5 bits. If the last group is not complete, then it is padded with zero bytes.

11110 10001 10101 00101 00101 00001 11111 11111 10001 00100 00000 00000 00000 00000 00000 00000

Then each 5 bit group is mapped to to the character set. Padding bytes are replaced with = signs (those equal signs you sometimes see).

6 R V F F B 7 7 R E = = = = = =

Discovery

Aidan Steele figured out that out of the last 16 characters (6RVFFB77RE7OLMKI), the first 8 characters (6RVFFB77) should contain the account ID. He noticed that the first 8 characters after the AKIA/ASIA part of the key ID, were almost the same value for keys generated in the same account.

That makes sense in hindsight actually. AWS account IDs are 12 digit integers. So in base32 that needs 8 characters (999999999999 in binary is 11110100011010100101001010000111111111111, 40 bits).

And after playing around with the get-access-key-info action of the AWS STS API, he noticed that the last 8 characters didn’t really get validated by the API at all. He was able to pass key IDs constructed with different values for the last 8 characters and keeping the characters between index 4 and 12 the same, and get the same account ID as a response. This indicated that the API isn’t doing a database lookup to validate the entire access key ID, and instead computing the account ID from the passed in value. Based on findings from a few more testing rounds, he came up with the piece of code (available in his blog that I’ve linked above) that could translate the access key Id to an account ID.

However, his code actually had to consider the the 13th character, at index 12. What he did was to mask out the first bit of the base32 decoded value and multiply it by 2. If the character at index 12 was Q or more than Q in the base32 mapping, the resulting value had to be incremented by 1.

	offsetAccountIdStr, _ := baseconv.Convert(accessKeyId[4:12], base32AwsFlavour, base10)
	offsetAccountId, _ := strconv.Atoi(offsetAccountIdStr)

	accountId := 2 * (offsetAccountId - offset)

	if strings.Index(base32AwsFlavour, accessKeyId[12:13]) >= strings.Index(base32AwsFlavour, "Q") {
		accountId++
	}

Improvements

Tal took a look at this and figured out the account ID had actually been bit shifted before being base32 encoded. It might just be me, but I find it really cool that Tal was able to figure this out by looking at Aidan’s code, because I spent a few hours “reverse engineering” this finding.

So basically, what’s happening here is this. The account Id decimal is converted to binary. This gives you the following bit representation.

999999999999 -> 11110100011010100101001010000111111111111

Then it is bit shifted with the Most Significant Bit of the resulting bytes to be 1.

 1110100011010100101001010000111111111111
10000000000000000000000000000000000000000 OR
------------------------------------------------
11110100011010100101001010000111111111111
------------------------------------------------

This explains Aidan’s observation of the impact of the 13th character. Because in the base32 mapping, Q is when the Most Significant Bit becomes 1.

A = 0000
B = 0001
…
P = 01111
Q = 10000
…
Z = 11001
2 = 11010
…
7 = 11111

Because the account ID is bit shifted, the 13th character’s (9th character of the last 16 characters) Most Significant Bit is the Least Significant Bit in the account ID. So if the MSB of the 13th character (index 12) is 1, the base32 13th character is going to be either Q or more than Q. In that case, the value derived by decoding the first 12 characters need to be incremented by 1.

[prefix - 4 characters - 20 bits][bitshifted account ID - 40 bits ][suffix]

[prefix - 4 characters]11110100011010100101001010000111111111111               (ignoring the suffix for now)
[prefix - 4 characters]11110 10001 10101 00101 00101 00001 11111 11111 1xxxx
                        5     6      7     8     9    10    11    12    13     <- index
                        6     R      V     F     F     B     7     7  (Q or more)

That was the representation of what Tal wrote down a few months ago.

Security Implications

This does not translate to a huge security breach. In fact, you’re not supposed to treat your account ID as security by obscurety.

Do not publicise it unless needed, but don’t depend on the fact that it’s not public.

Hiding your account ID is not your first line of defense. In fact, it is not even a line of defense. It should just be something harder to figure out without permission. If someone actually figures it out, that shouldn’t be the reason for security of your architecture to breakdown.

We don’t know what is encoded in the last 8 characters in the key ID, it could be some more information, or it could be a random salt that AWS doesn’t really use yet. Also this method of encoding only seems to start for access keys created after early 2019. And going back to keys created before 2010 don’t even follow this format. AWS STS still seems to match an account ID with those keys, so they seem to do a database lookup for certain keys.

This was a video I did to explore a cool niche corner of the AWS landscape that people outside of security researchers rarely venture to find out. I’ve been using access keys for 9 years before actually taking some time to dig deeper. IMO this also puts focus into “security theatre” practices some people, including yours truly, do most of the time. It shows how important it is to understand the details under to hood for effective security.