Governance Guardrails
Governance Guardrails
At scale, governance is not bureaucracy — it is the engineering discipline that keeps 200 teams from accidentally destroying each other's workloads, spending $800 k on forgotten GPU instances, or opening port 22 to the public internet because someone skipped the security review. At Amazon, Google, and Microsoft the mechanism is the same: policy as code enforced at the platform layer, not audited after the fact by a human reading a spreadsheet.
This lesson covers four interlocking guardrail layers: Service Control Policies (SCPs) for hard permission boundaries, AWS Config rules for continuous compliance, tagging standards for resource accountability, and budgets for cost control. Each operates at a different point in the control plane — and together they form a defence-in-depth governance model that does not require trusting every developer to do the right thing.
Service Control Policies (SCPs)
SCPs are maximum permission boundaries attached to AWS Organizations OUs or individual accounts. They do not grant permissions — they reduce the ceiling of what IAM policies can grant. An SCP can prevent any principal in an account (including the account root user, except for a few billing actions) from calling certain APIs, regardless of what their IAM policy says.
The mental model: IAM policies define what an identity is allowed to do. SCPs define what the account is allowed to allow. The effective permission is the intersection. This means SCPs are the right layer for hard organisational rules — things that must never happen regardless of who asks.
Common production SCP patterns at big tech include: denying ec2:ModifyInstanceAttribute to prevent disabling termination protection on production boxes, denying organizations:LeaveOrganization to prevent a rogue admin ejecting an account from governance, denying all API calls outside approved regions (data residency), and denying iam:CreateUser in workload accounts to enforce federated access only.
sts:AssumeRole with no carve-outs can lock every human and CI system out of the account instantly — and recovery requires action from the management account.SCPs should be managed as Terraform resources (aws_organizations_policy + aws_organizations_policy_attachment), committed to your landing zone repository, and applied through a CI pipeline that has a mandatory terraform plan review step. Never apply SCP changes manually.
AWS Config Rules
If SCPs are the firewall — blocking actions before they happen — AWS Config rules are the continuous audit — detecting when the current state of resources violates your standards. Config records every API-level change to supported resources and evaluates each resource against rules you define. Non-compliant resources trigger notifications (SNS), findings (Security Hub), or automated remediation (SSM Automation documents).
AWS provides ~200 managed rules. The ones every production org enables from day one include: restricted-ssh (no SG allows 0.0.0.0/0 on port 22), s3-bucket-public-read-prohibited, encrypted-volumes, rds-storage-encrypted, mfa-enabled-for-iam-console-access, and required-tags. Custom rules can be written as Lambda functions (or using Guard policy language) for business-specific checks.
aws_config_conformance_pack across all accounts in an OU simultaneously. The AWS Security Hub "Foundational Security Best Practices" standard aggregates Config findings from all accounts into a single pane — enable it in the Security tooling account and configure a cross-account aggregator.Tagging Standards
Tags are the connective tissue of cloud governance. Without consistent tags you cannot answer: "Which team owns this EC2 instance?" "What is the monthly cost of the payments service?" "Which resources are subject to GDPR?" Every major cloud cost blowup I have seen traces back to untagged or inconsistently tagged resources.
A production tagging standard covers four categories:
- Identity:
Owner(team email DL),Project(Jira project key),CostCenter(GL code) - Lifecycle:
Environment(prod / staging / dev),Terraform(true — to detect manual resources),CreatedBy(IAM principal fromaws:PrincipalArn) - Compliance:
DataClassification(public / internal / confidential / restricted),Regulation(GDPR / PCI / HIPAA — pipe-delimited) - Operations:
BackupPolicy(daily / weekly / none),PatchGroup(SSM patch group name)
Enforcement happens at two points. First, the required-tags Config rule marks any resource missing mandatory tags as NON_COMPLIANT — but it does not block creation. For blocking, use an SCP or an IAM permission boundary that conditions ec2:RunInstances on the presence of required tag keys via aws:RequestTag conditions. Second, Terraform modules enforce tags at the module level using merge(local.mandatory_tags, var.tags) so developers cannot accidentally omit them when using the blessed module library.
Budgets & Cost Guardrails
A budget without an action is just a notification nobody reads. Production cost governance at scale uses AWS Budgets with automated actions to enforce spend limits without human intervention. There are three levels of response worth configuring on every account:
- Alert at 80% — notify the team Slack channel (via SNS → Lambda → Slack webhook). No action taken; early warning.
- Alert at 100% — notify engineering leadership and the FinOps team. Trigger an SNS topic that a Lambda function reads to post a P2 ticket in your incident tracker.
- Action at 110% — AWS Budgets applies an IAM policy that prevents launching new EC2 instances, RDS instances, or NAT Gateways in that account until the budget resets. This is the hard stop that prevents a runaway autoscaling event from spending $50k overnight.
Supplement account-level budgets with cost anomaly detection. AWS Cost Anomaly Detection uses ML to identify spend patterns that deviate from historical baselines — it will catch a forgotten p3.16xlarge training job or an S3 bucket accidentally made public and being hammered by bots before end-of-month billing does.
Pulling It Together: The Governance Pipeline
These four mechanisms are most powerful when they are managed as code in the same landing zone repository. The recommended workflow: SCPs and Config conformance packs live in Terraform, applied by a dedicated governance pipeline that requires two senior engineer approvals. Budget thresholds are stored as per-account variables in a budgets.tfvars map. Tagging standards are enforced at the Terraform module layer so compliance is automatic, not aspirational. Drift from any of these layers triggers a Security Hub finding that pages the Platform team — not just an email that goes unread.
aws resourcegroupstaggingapi get-resources --tag-filters), and budget burn rates. Make it visible to engineering leadership — visibility creates accountability.