Startup Engineering: From 1 AWS Account to 100

This post originally appeared on the Segment blog. Segment, Hacker Noon’s weekly sponsor, is currently offering a 90 day free trial — email friends@segment and mention Hacker Noon to redeem.

Segment receives billions of events from our customers daily and has grown in to dozens of AWS accounts. Expanding in to many more accounts was necessary in order to best align with our GDPR and security initiatives, but it comes at a large complexity cost. In order to continue scaling gracefully we are investing in building tooling for employees to use with many accounts, and centrally managing employee access to AWS with terraform and our identity provider.

Segment began in a single AWS account and last year finished our move to a dev, stage, prod, and “ops” accounts. For the past few months we’ve been growing at about one new AWS account every week or two, and plan to continue this expansion in to per-team and per-system accounts. Having many “micro-accounts” provides superior security isolation between systems, and reliability benefits by limiting the blast radius of AWS rate-limits.

When Segment had only a few accounts, employees would log in to the AWS “ops” account using their email, password, and 2FA token. Employees would then connect to the ops-admin role in the dev, stage, and prod accounts using the AssumeRole api.

Segment now has a few dozen AWS accounts and plans to continue adding more! In order to organize this expansion we needed a mechanism to control our accounts, which accounts employees have access to, and each employee’s permissions in each account.

We also hate using AWS API keys when we don’t absolutely have to so we moved to a system where no employees have any AWS keys. Instead, employees only access AWS through our identity provider. Today we have zero employees with AWS keys and there is no future need for employees to have a personal AWS key. This is a massive security win!

Designing a scalable IAM architecture

Segment uses Okta as an identity provider, and consulted their integration guidefor managing multiple AWS accounts, but improved it with a minor change for better employee experience. The integration guide recommends connecting the identity provider to each AWS account but this breaks AWS’ built in support for account switching and was more complicated to audit which teams had access to which roles.

Instead, employees use our identity provider to connect to our “ops” account and then use the simple token service assume-role API to connect to each account they have access to. Using our identity provider, each team is assigned to a different role in our hub account, and each team role has access to different roles in each account. This is the classic “hub-and-spoke” architecture.

In order to make maintaining our hub-and-spoke architecture simple, we built a terraform module for creating a role in our spoke accounts, and a separate terraform module for creating a role in our hub account. Both modules simply create a role and attach a policy ARN to it, which is part of the module’s input.

The only difference between the modules are their trust relationships. The hub role module allows access from our identity provider while the spoke module only allow access from the hub account. Below is module we use for allowing access to a hub role from our Identity provider.

resource "aws_iam_role" "okta-role" {
name = "${var.name}"
assume_role_policy = <{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Federated": "${var.idp_arn}"
},
"Action": "sts:AssumeRoleWithSAML",
"Condition": {
"StringEquals": {
"SAML:aud": "https://signin.aws.amazon.com/saml"
}
}
}
]
}
EOF
}
resource "aws_iam_role_policy_attachment" "okta-attach" {
policy_arn = "${var.policy_arn}"
role = "${aws_iam_role.okta-role.name}"
}

In order to provide each team with granular access to only the resources the teams need we create a role for each team in the hub account using our hub role terraform module. These roles mostly contain IAM policies for sts:AssumeRole in to other accounts but it is also possible to give granular access in our hub role too.

One concrete and simple example of a granular policy is our Financial Planning and Analysis team’s role, who keeps close watch on our AWS spend. Our FP&A team only has access to billing information and information about our reserved capacity.

module "fpa" {
source = "git@github.com:segmentio/access//modules/okta-role"
name = "fpa"
idp_arn = "${module.idp.idp_arn}"
policy_arn = "arn:aws:iam::aws:policy/job-function/Billing"
}
resource "aws_iam_policy" "fpa_reserved_policy" {
name = "fpa_reserved_policy"
description = "FP&A team needs ability to describe our reserved instances."
policy = <{
"Version": "2012-10-17",
"Statement": [
{
"Action": [
"ec2:GetHostReservationPurchasePreview",
"ec2:DescribeReservedInstancesModifications",
"ec2:DescribeReservedInstances",
"ec2:DescribeHostReservations",
"ec2:DescribeReservedInstancesListings",
"ec2:GetReservedInstancesExchangeQuote",
"ec2:DescribeReservedInstancesOfferings",
"ec2:DescribeHostReservationOfferings",
"ec2:CreateReservedInstancesListing"
],
"Effect": "Allow",
"Resource": "*"
}
]
}
EOF
}
resource "aws_iam_role_policy_attachment" "fpa_reserved_attach" {
role = "fpa"
policy_arn = "${aws_iam_policy.fpa_reserved_policy.arn}"
}

The FP&A team does not have access to our spoke accounts, though. One team that needs full access to much of our infrastructure and all of our accounts is our Foundation and Reliability team, who participate in our on-call rotation. We provide both a ReadOnly role, and an Administrator role to our foundation team in all of our accounts.

module "foundation" {
source = "git@github.com:segmentio/access//modules/okta-role"
name = "Foundation"
idp_arn = "${module.idp.idp_arn}"
policy_arn = "${aws_iam_policy.foundation-policy.arn}"
}
resource "aws_iam_policy" "foundation-policy" {
name = "foundation-policy"
description = "A policy for foundation to access all AWS accounts, as on-call"
policy = <{
"Version": "2012-10-17",
"Statement": [
{
"Action": [
"sts:AssumeRole"
],
"Effect": "Allow",
"Resource": [
"arn:aws:iam::${var.ops_account}:role/AdministratorAccess",
"arn:aws:iam::${var.ci_account}:role/AdministratorAccess",
"arn:aws:iam::${var.dns_account}:role/AdministratorAccess",
"arn:aws:iam::${var.customerdata_account}:role/AdministratorAccess",
"arn:aws:iam::${var.customerdata_account}:role/ReadOnly",
"arn:aws:iam::${var.dns_account}:role/ReadOnly",
"arn:aws:iam::${var.ci_account}:role/ReadOnly",
"arn:aws:iam::${var.ops_account}:role/ReadOnly",
...
]
}
]
}
EOF
}

After per-team roles are created for each team in the hub account, employees are assigned to groups that represent their teams in Okta, and each team can then be assigned to their associated role in the hub account.

Okta allows each group to be assigned different IAM roles in the hub account, and using their UI we can assign the FP&A team to our “Amazon Web Services” app, and restrict their access to the fpa role that we created for them in the hub account.

After building this, we needed the tooling to provide our employees with an amazing engineering experience. Even though this system is far more secure, we wanted it to be just as usable and efficient as our setup with only a handful of AWS accounts.

Maintaining usability with aws-okta

One great thing about our old IAM setup was each employee with AWS access could use AWS APIs from their local computer using aws-vault. Each employee had their IAM user credentials securely stored in their laptop’s keychain. However, accessing AWS entirely through Okta is a massive breaking change for our old workflows.

Our tooling team took up the challenge and created a (near) drop in replacement for aws-vault which our engineering team used extensively, called aws-okta. aws-okta is now open-source and available on github.

The quality of aws-okta is the principal reason that Segment engineers were able to smoothly have their AWS credentials revoked. Employees are able to execute commands using the permissions and roles they are granted, exactly like they did when using aws-vault.

$ aws-okta exec hub -- aws s3 ls s3://
2018/02/08 15:40:22 Opening keychain /Users/ejcx/Library/Keychains/aws-okta.keychain
INFO[0004] Sending push notification...

There is a lot of new complexity handled with aws-okta that is is not able to be handled in aws-vault. While aws-vault uses IAM user credentials to run commands, aws-okta uses your Okta password (stored in your keychain) to authenticate with Okta, waits for a response to a push notification for 2FA, and finally provides AWS with a SAML assertion to retrieve temporary credentials.

In order to authenticate with Okta, aws-okta needs to know your Okta “application id”. We took the liberty of extending the ~/.aws/config ini file to add in the necessary id.

[okta]
aws_saml_url = home/amazon_aws/uE2R4ro0Gat9VHg8xM5Y/111

When Segment had only a few AWS accounts and the ops-admin role, Segment engineers all shared the same ~/.aws/config. Once each team had access to different accounts and systems, we needed a better system to manage each team’s ~/.aws/config. Our system also needed a way to update the access that employees had quickly, when new accounts and roles are created.

We decided to integrate this solution closely with prior art that Segment had built. Each team’s config is stored in a git repo that has our company dotfiles in it. Each team can initialize their aws config by using our internal tool called robo, which is a tool to share helpful commands between employees.

$ SEGMENT_TEAM=foundation robo config.aws
✔️ : Your old aws config has been backed up in /tmp/awsconfig-318c16acc2b25bed2eb699e611462744
✔️ : Your aws config was successfully updated.
$ shasum ~/.aws/config
c2734b78e470c51a26d8c98e178d4ec2ed1b1b06 /Users/ejcx/.aws/config
$ SEGMENT_TEAM=platform robo aws.config
✔️ : Your old aws config has been backed up in /tmp/awsconfig-d5688401634de0e8b2f48b11377d0749
✔️ : Your aws config was successfully updated.
$ shasum ~/.aws/config                 
283053d6f5a23ca79f16c69856df340b631d3cdf /Users/ejcx/.aws/config

This was only possible to add because all Segment engineers already had an environment variable called called SEGMENT_TEAM, which denotes the team the engineer is a part of. Running robo aws.config will clone the dotfiles repo, save the old ~/.aws/config, and initialize the most recent config for their team.

AWS bookmarks were the primary way that engineers navigated our environment when we utilized fewer accounts. When we got rid of the ops-admin role, the engineers sign-in bookmarks stopped working. Additionally, AWS bookmarks only support up to five different AssumeRole targets and we now have many more than five accounts.

In order to support having many more accounts, we mostly abandoned bookmarks and instead ensured that aws-okta supports engineers who needed to switch AWS accounts often. Our previous use of aws-vault meant many of us were familiar with the aws-vault login command. We found that adding a login command to aws-okta helped engineers who switched accounts often.

After responding to the Duo push notification aws-okta will open a browser and log in to the specified role in only a couple of seconds. This feature is supported by the AWS Custom Federated Login feature, but feels more like magic when using it. It makes logging in a breeze.

Beyond 100 accounts

We expect to be near 50 AWS accounts by the end of this year. The security of having an account be completely closed by default, and the reliability benefits of having isolated per-account rate-limits are compelling.

This system we have built is plenty robust and usable enough to scale our AWS usage to many hundreds of AWS accounts and many more engineering teams.

Deleting all employee AWS keys was extremely satisfying from a security perspective, and this alone is a compelling enough reason to integrate your identity provider with your AWS hub account.

This post originally appeared on the Segment blog. Segment, Hacker Noon’s weekly sponsor, is currently offering a 90 day free trial — email friends@segment and mention Hacker Noon to redeem.


Startup Engineering: From 1 AWS Account to 100 was originally published in Hacker Noon on Medium, where people are continuing the conversation by highlighting and responding to this story.