How to Securely Design and Build Cloud Applications

In the last post/video, I tried to replicate the stupidity of (almost) intentionally exposing your private user data as a public bucket/database. While it felt great to reinforce my belief that Cloud is relatively safe by itself, to reinforce my ego to know at least I’m not that stupid, I did feel a little incomplete. I broke down what could’ve possibly gone wrong, but I didn’t show how to fix it, or prevent that from happening.

and apparently a new similar app, TeaOnHer has leaked user data in a similar manner. there’s a joke about copying homework somewhere around here

So I thought of contributing to the solution, show how to not make insecure web applications, show how to put safeguards in, so that even if some guy fresh out of a sales stint in big-tech wanted to make a bucket public, wouldn’t be able to do so without leaving a significant irreputable evidence trail admissable in court (that is for those who actually built Tea app). Why? Because being insecure is sometimes not just a simple a mistake but a more intentional decision, as I demonstrated in the last post. Telling people not to make mistakes is a start, but what you really need, are policies… and evidence trails.

I’m going to use a typical web application development scenario, from idea to production operation (100k user or whatever you small to medium scale metric you want to consider). I’m going to base this on AWS, because AWS is what I’m mostly familiar with. However, don’t get me wrong, being in AWS isn’t what makes you secure. Being secure is cloud agnostic, it’s even code agnostic in most cases (yeah, I know I know, Rust makes it possible to avoid a whole class of security issues. Just stick with me here on the point).

This is not an exhaustive list of stuff you can do to secure a small to medium web application, however this can be a pretty good place to start. I would even call this at 80%. Besides, I assure you, these Tea apps didn’t leak their user data because they hadn’t secured an obscure vulnerability, they leakd their data because they didn’t care about even the most basic security principles.

Principles and Guidelines

Overall, we are going to focus on practicing the following.

Role Based Access Control (RBAC)
Least Privilege Principle
Defense in Depth
Segmented Path to Production - dev, test, and prod separated
Data Classification and Separation - no prod data in test

and enforce the following guidelines on development and operations.

Config as Code (almost everything possible as code)
Secure Secrets Management
Some level of managed observability

These principles and practices aren’t going to make your application secure by themselves. They should be used together to build an overall secure posture for your application and deployment architecture. So, just having RBAC doesn’t make sense if you don’t implement Least Privilege Principle.

RBAC and Least Privilege Principle

I picked these two together because they work together more closely than the others. They reinforce each other and cannot stand alone without each other.

Role Based Access Control, or RBAC, means providing users access to your stack based on different roles that have been assigned to those users. Now, I’m not going to go in to the whole RBAC vs ABAC (Attribute Based Access Control) comparison that seems to come naturally to these discussions. As a rule, for most small to medium Cloud Applications, RBAC is a good enough place to start with. You almost always, don’t need to stat with ABAC unless you have a specific use case to addresss with ABAC.

Least Privilege Principle means providing only the minimum necessary level of permissions to an actor in your environment. This could be end users on your web application, developers, devops, and other types of users who are developing and maintaining your application, or external users who might need access to your stack from time to time.

For your end users, you define different types of roles based on the type of actions they need to perform on your application. Then for each of these roles, you define different permissions that define which level of access is granted to different resources in the application. For example, in an application like Tea for Women, users should be granted permissions to read and modify only the data that they produced, not the entire set of data produced by all users. There are various libraries and frameworks for different programming languages that help you implement a good RBAC and Least Privilege Principle.

In AWS for example, you can define different permission policies in IAM Identity Center based on the types of users who would interact with your Cloud deployment, and attach those policies as Permission Sets for different Accounts. Going a step further, a given Role would need a higher level of permissions for one Account and just ReadOnly permissions for another Account (dev vs prod Accounts is a very good example here). You can easily implement this differentiation in IAM Identity Center because a Group or a User gets access to an Account with a specific Permission Set. So, different Accounts can be granted access with different Permission Sets.

Defense in Depth

I like the name of this principle.

If I’m not mistaken, this term comes from the military strategy of building multiple lines of defense when defending a certain part of the battle line. The deeper the defense line, more confident you can be of your strategy of holding back the enemy.

The same goes for “defending” a Cloud Application.

It’s not enough that you’ve enabled username and password authentication to protect your application, if you haven’t blocked network level traffic to your database and your data buckets. It doesn’t matter that you have blocked network level traffic to your data buckets, if you have provided global read and write permissions to all the data for each actor. It doesn’t matter that you’ve provided least privilege permissions, if your user authentication process doesn’t use MFA.

If it takes failure of only a single line of defense for the whole application to be compromised, then you don’t really have a defensive strategy. Your strategy depends on pure luck. Security of a software product depends on multiple factors being built together. If you don’t believe this, you only need to ask the French about the Maginot Line in 1940.

I’ve heard too many justifications on why you don’t really need IP level restrictions because there’s an authentication step anyway, or why you don’t need MFA because there’s an IP level restriction, when in truth, you really need both of these security measures for a truly secure application.

Segmented Path to Production

This is something that I’ve started seeing less and less in the wild, but I’ve made a point of including it as a principle in this video. In 2025, almost everyone knows that you should have a clear separation between development and production environments.

For a given software change, there should be a clear path to production, and that path should consist of enough testing before the change actually reaches production environment.

This is not only to make change management easier, it makes sure that you’re not accidentally rolling out a major security bug without adequate testing and oversight from people with responsibility.

Although I should say, even though most projects and deployments do tend to have clearly separated environments, for example, in different AWS Accounts and VPCs, they tend to give AdministratorAccess to all of these environments to all the users in the project. Sure, this can make iterating easier during early development phases, but it can also quickly become a nightmare if you keep on doing that well after go-live date. People do what they can do, so if your “line of defense” against making manual changes in prod is a polite reminder during standup for the people with AdministratorAccess, that’s going to be as effective as asking a cat not to hunt birds. You can only rely on policies to actually rely on people doing the right thing.

Don’t depend on good intentions, depend on controls! - someone, I can’t remember who

Data Classification and Separation

There will different types of data that will be produced by your application during its lifetime of production operation. You need to be able to define how sensitive these type of data so that you’d be able to put in necessary security controls. If you don’t do this classification, then you don’t really have a basis to build security on, you don’t have a justification on why a user’s personally identifiable data should be more secure than a mass email that generated by your application.

For an example, if you’re asking for scans of government IDs from users, then these are clearly PII classified, so they should be protected at different levels with highest protections (encryption at rest, encryption in transit, least privilege permissions etc). The load balancer access logs that your web frontend generates should have a lower sensitivity than PII, although, they should still be protected from public access. That email of shame you send out after getting compromised and having all your user data leaked, that’s going to be public knowledge, and doesn’t really need to be protected at all.

In AWS for example, you can start assigning different KMS keys for different classes of data for encryption at rest. You can define granular key policies, and then make sure that if a certain bucket is tagged with sensitive or PII classes without the specific keys being used for encryption, lights will go off in your dashboards, and an incident response can be kicked off.

Practices

So, these principles are good enough for 80% of the use cases. But the challenge is not really applying them, it’s continuing to apply them throughout the product lifecycle. You can easily start with clearly defined roles and least privilege permissions assigned to those roles, but it’s going to be useless if anyone could log in to the AWS console, and assign AdministratorAccess to any role without any oversight.

So I thought of discussing some key software development practices and guidelines that can be easily implemented in a cloud environment. Just a note, that these are not really cloud-specific practices, but operating in the public cloud makes practicing these really easy, that it’s criminal not to do so.

Like the principles above, these will work best together, and they should usually be incorporated into the software and cloud architecture from the beginning by the architect or whoever’s in charge of the design.

Config as Code

If you have not heard of Infrastructure as Code in 2025, congratulations, you’re not looking down possible layoffs. But also, if that’s the case and if you’re in software industry, why?

Almost everything can be *-as-Code in the modern age. You got code as code (easy), Infrastructure as Code, Policy as Code, and even stupidity as code if you consider my github profile.

Having resources like infrastructure and policies as code provides two key benefits to your software development process.

Having reviewable and traceable artefacts that reflect what’s deployed in your production environment
Being able to control how changes are deployed in a very reproducible manner

Taking my example above, if you’re managing your IAM permission policies with IaC (Terraform, CloudFormation, OpenTofu, Pulumi, CDK, whatever is comfortable for you), then you can,

make sure that any changes to the permission policies are reviewed and trace back when certain changes were introduced and why
make sure when those changes are rolled out to the production environment, with human feedback and approval in the process

Same goes for Policy as Code. You are ensuring that changes to these key controls are actually controlled, and no one actually needs to do click-ops to make changes in the environment. This beats having to write word documents for change requests, and having to trace through months old or even years old event logs to understand why Sean decided to make a bucket with sensitive data publicly accessible.

You can easily start with IaC with various templates and modules in place to get started for most common Cloud Architectures, and if you’re a software engineer, it’s just a matter of learning a new DSL to get started with IaC.

Secure Secrets Management

It’s sad to see that even in 2025, people keep committing credentials to git in plaintext, when secure secrets management is just a hop away.

Various software components will need to use sensitive data to interact with various other components. These could be database passwords or connection strings, API keys, secrets tokens, or just a string with high entropy needed to generate temporary keys. If you keep committing these to git in plaintext, then honestly, someone should have a word with about your career.

In the public cloud, it’s just a matter of using whatever the secrets management service to store these sensitive strings and data, and then retrieve them during runtime. For example, you can easily store API keys that you need to use to talk to a remote API, as a secret in AWS Secrets Manager, and read them during runtime. You can put a policy in Secrets Manager defining which actors can read the secret, and you can start doing this even during development, so this doesn’t have to be a “hardning” step.

Always start from the secrets management service as a habit, and you’ll never commit a secret into git ever again.

(Managed) Observability

I’ve written about Observability as a quality of a software product since 2018, and stuff has not changed a lot since then.

You can’t improve what you can’t measure. You need to be able to see what your product does, both in terms of runtime data and the administration of the deployment. Observability commonly focuses on logs, metrics, and traces produced by the software product itself, but the events and logs produced by the activity on the infrasturcture itself should also be in scope for what you’re planning to observe.

For an example, you need to be able to see if there’s an authenticated AWS principal, who’s changing permission policies in your deployment in the odd hours of your time zone. You need to be able to get notified when that happens, and you need to be able to zero in on the traces of that activity.

Why I’ve included the term “Managed” in this practice is that observability stacks usually tend to be complex software products on their own. Operating in the cloud helps you reduce the complexity of an observability stack, by using services like CloudTrail, CloudWatch, and X-Ray, and not have to worry about the uptime of the observability stack itself.

Also, service like CloudTrail, make sure that there’s an immutable trail of audit logs on the controlplane activity (and some data plane activity as well in some cases), so a threat actor can’t clean the traces of their activity if they actually breach your security controls.

So the managed aspect of this observability stack (could be any combination in any public cloud provider) makes the barrier to implementing one in any type of software development practice. You can easily start writing your application logs to CloudWatch, you can push custom Metrics from your application to CloudWatch, build Alarms on those Metrics, you can easily push traces from your application to X-Ray, which will make it easier to improve your software performance, and enabling CloudTrail as an audit trail is a matter of couple of lines of code in Terraform or CloudFormation.

Conclusion

Obviously, these are not going to cover 100% of the use cases. Software requirements can be really complex. However, most cloud based web applications tend to be really similar in cloud architecture. And, these principals and practices don’t really change based on the actual business requirements of the product.

I’ve noticed that people usually take wrong decisions when it comes to cloud architecture becuase they are unaware of the cloud capabilities or because they bring in incorrect mental models on how the cloud works from previous practices. With these fundamental principles, I hope I was able to change how you think about cloud software development security. If it did, then it’s a matter of finding more information on how to implement concrete practices and architecture components for your specific design.