We are still cooking the magic in the way!
Network Architecture at Org Scale
Network Architecture at Org Scale
When a startup grows into a multi-team, multi-account AWS organization, the naive approach — one VPC per workload, each peered ad-hoc — collapses under its own weight. Peering connections form an O(n²) mesh, route tables become unmanageable, and security teams lose visibility into inter-service traffic. The enterprise answer is a deliberate hub-and-spoke topology built around a centralized Transit Gateway, shared service VPCs, and a single choke point for internet egress.
This lesson covers the three pillars of org-scale networking: the hub-and-spoke model, shared VPCs via AWS Resource Access Manager (RAM), and centralized egress through an inspection VPC. You will leave with Terraform and AWS CLI patterns you can run in a real organization.
The Hub-and-Spoke Model
In a hub-and-spoke design, a central Transit Gateway (TGW) acts as the hub. Every spoke VPC — one per account, per environment, or per business unit — connects to the TGW via a TGW attachment. Spokes never peer with each other directly; all traffic transits the hub. This gives you:
- Linear attachment scaling — AWS TGW supports up to 5,000 VPC attachments per gateway; no peering mesh.
- Centralized route control — TGW Route Tables define which spokes can reach which. Isolated RTs prevent prod/dev cross-contamination without per-VPC ACL duplication.
- Transitive routing — spoke A can reach spoke B only if the TGW RT permits it. VPC peering lacks this; TGW enables it natively.
- Inspection insertion — you can steer all East–West or North–South traffic through a centralized firewall VPC without changing any spoke.
Terraform: Transit Gateway and Spoke Attachment
The following Terraform creates a TGW in the network AWS account, then attaches a spoke VPC (owned by a workload account) using RAM sharing. This is the pattern AWS Landing Zone Accelerator and Control Tower use internally.
Shared VPCs via Resource Access Manager
A Shared VPC (also called VPC Sharing) lets you own one VPC in a central network account and share individual subnets into multiple AWS accounts via RAM. Workload accounts launch EC2 instances and ECS tasks directly into the shared subnets — they never need their own VPC or NAT Gateway. This pattern dramatically reduces NAT costs at scale.
- The central account retains control of routing, NACLs, and flow logs.
- Each participant account controls security groups within the shared subnet — they cannot modify the route table.
- IAM Service Control Policies in the org can prevent participant accounts from creating their own VPCs entirely, enforcing the shared model.
Centralized Egress and Traffic Inspection
Allowing each spoke VPC to egress directly to the internet creates blind spots: no unified threat detection, no single FQDN allowlist, and firewall rules scattered across 50 accounts. The solution is to route all outbound internet traffic through a dedicated Egress VPC (sometimes called a Security VPC) that houses:
- AWS Network Firewall (or a third-party NGFW) — stateful packet inspection, FQDN filtering, IDS/IPS.
- NAT Gateways — a small pool of stable Elastic IPs that you can add to vendor allowlists.
- VPC Flow Logs → S3 / CloudWatch — single stream for SIEM ingestion.
The TGW Route Table wires this up: spoke VPCs have a default route (0.0.0.0/0) pointing at the TGW, and the TGW inspects the attachment's route table to forward to the Egress VPC. The Egress VPC does NAT and sends the packet to the internet gateway.
Production Failure Modes
appliance_mode_support = "enable"). This single flag is the most common missed step in new org-scale network builds.
- TGW bandwidth limits — each attachment is capped at 50 Gbps burst. High-throughput data pipelines (S3 bulk transfers, Spark EMR clusters) should use VPC Endpoints or S3 Gateway Endpoints inside the spoke to bypass the TGW entirely.
- CIDR overlap — plan your org-wide IP space before attaching spokes. TGW rejects attachments with overlapping CIDRs in the same route domain. Use AWS IPAM (IP Address Manager) to allocate non-overlapping /16s to each OU.
- DNS resolution across accounts — Route 53 Resolver endpoints in the Shared Services VPC, with forwarding rules shared via RAM, are the standard solution. Without it, private hosted zones in one account are invisible to workloads in another even when the network path exists.
aws ec2 allocate-ipam-pool-cidr — the system assigns a non-overlapping block automatically. This eliminates the most painful source of org-scale network re-architecture.
Key Takeaways
- Hub-and-spoke via Transit Gateway gives you O(n) scaling, centralized routing control, and traffic inspection insertion without touching any spoke.
- VPC Sharing (RAM) consolidates NAT Gateways and keeps network ownership in one account while letting dozens of teams deploy into shared subnets.
- Centralized egress with AWS Network Firewall provides a single FQDN allowlist, unified flow logs, and a stable set of Elastic IPs — critical for compliance and incident response.
- Enable
appliance_mode_supporton TGW attachments going to inspection appliances to prevent AZ-asymmetry firewall drops. - Use AWS IPAM from day one to eliminate CIDR overlap as the organization grows.