AWS Networking & Identity

Security Groups & NACLs

18 min Lesson 3 of 28

Security Groups & NACLs

AWS gives you two overlapping, complementary firewall primitives: Security Groups (SGs) and Network Access Control Lists (NACLs). Understanding the exact difference — stateful vs stateless, where each is enforced, and how they compose — is one of the most frequently tested gaps in production AWS incidents. Get this model right and you will never again wonder why a port is mysteriously open or why an allow rule seems to do nothing.

Stateful vs Stateless Filtering

The single most important distinction is connection tracking.

Security Groups are stateful. When you allow inbound TCP/443, the SG automatically permits the corresponding return traffic (the ACK/response packets) even if there is no explicit outbound rule. The kernel's connection tracking table handles this. In practice: you write the fewest rules, and they express intent, not packet-level mechanics.

NACLs are stateless. They evaluate every packet independently with no memory of prior packets. If you allow inbound TCP/443, you must also explicitly allow outbound TCP in the ephemeral port range (1024–65535) for the responses to leave the subnet. Forgetting this is the number-one NACL misconfiguration in production.

The mental model: Security Groups protect resources (they attach to ENIs). NACLs protect subnet boundaries (every packet crossing the subnet boundary is checked). Traffic passes both filters — the NACL is evaluated first at the subnet boundary, then the SG at the ENI.

Layered Security Architecture

The diagram below shows how the two layers stack for a typical three-tier application inside a VPC.

Traffic traverses the NACL at each subnet boundary, then the Security Group at each network interface — both layers must allow the packet.

Security Group Rules in Depth

SGs are allow-only — there is no explicit deny. All inbound traffic is denied by default; all outbound traffic is allowed by default. Rules can reference other SG IDs instead of CIDR blocks, which is the idiomatic AWS pattern for east-west service-to-service communication.

# Terraform: three-tier SG chain — ALB → App → DB
resource "aws_security_group" "alb" {
  name   = "sg-alb"
  vpc_id = var.vpc_id

  ingress {
    from_port   = 443
    to_port     = 443
    protocol    = "tcp"
    cidr_blocks = ["0.0.0.0/0"]
  }
  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }
}

resource "aws_security_group" "app" {
  name   = "sg-app"
  vpc_id = var.vpc_id

  # Only accept traffic from the ALB — referenced by SG ID, not CIDR
  ingress {
    from_port       = 8080
    to_port         = 8080
    protocol        = "tcp"
    security_groups = [aws_security_group.alb.id]
  }
  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }
}

resource "aws_security_group" "db" {
  name   = "sg-db"
  vpc_id = var.vpc_id

  # Only accept MySQL from the app tier
  ingress {
    from_port       = 3306
    to_port         = 3306
    protocol        = "tcp"
    security_groups = [aws_security_group.app.id]
  }
  # No outbound egress needed for RDS in most cases
  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }
}

Prefer SG-ID references over CIDRs for internal traffic. When you reference sg-app in sg-db's inbound rules, any new instance that joins sg-app is automatically allowed — you never chase subnet ranges as you scale. This pattern is standard at production scale.

NACL Rules in Depth

NACLs process rules in ascending number order and stop at the first match (like ACLs on a traditional router). There is always an implicit * DENY ALL at the bottom. Rules go up to 32766; convention is to space them in multiples of 100 so you can insert rules later. Each NACL is associated with one or more subnets.

# Terraform: NACL for the private subnet
# Allows inbound from the public subnet + ephemeral return traffic
resource "aws_network_acl" "private" {
  vpc_id     = var.vpc_id
  subnet_ids = [var.private_subnet_id]

  # Allow inbound HTTP from ALB (public subnet CIDR)
  ingress {
    rule_no    = 100
    protocol   = "tcp"
    action     = "allow"
    cidr_block = "10.0.1.0/24"
    from_port  = 8080
    to_port    = 8080
  }

  # Allow inbound HTTPS from public subnet (ALB health checks TLS)
  ingress {
    rule_no    = 110
    protocol   = "tcp"
    action     = "allow"
    cidr_block = "10.0.1.0/24"
    from_port  = 443
    to_port    = 443
  }

  # Allow RETURN traffic: ephemeral port range (responses leaving private subnet)
  egress {
    rule_no    = 100
    protocol   = "tcp"
    action     = "allow"
    cidr_block = "10.0.1.0/24"
    from_port  = 1024
    to_port    = 65535
  }

  # Allow outbound HTTPS (for SSM, ECR, S3 API calls)
  egress {
    rule_no    = 110
    protocol   = "tcp"
    action     = "allow"
    cidr_block = "0.0.0.0/0"
    from_port  = 443
    to_port    = 443
  }
}

Ephemeral ports are the silent killer. If you add a strict NACL to an existing subnet and forget the outbound ephemeral range (TCP 1024–65535) for responses, established connections will appear to time out instead of getting a clean reset. This looks like a flaky service, not a firewall misconfiguration, and wastes hours in production. Always add egress rules 1024–65535 toward any CIDR that initiates connections into your subnet.

When to Use Each Layer

In a well-designed AWS account, both layers run simultaneously but serve different purposes:

Security Groups — your primary, fine-grained control. Use SG-ID references for all east-west traffic. Apply least-privilege egress (restrict outbound by port and destination). Audit with AWS Config rule restricted-ssh and vpc-sg-open-only-to-authorized-ports.
NACLs — a coarse subnet-level backstop. Use them to enforce hard organizational policies: block an entire CIDR that has been compromised, enforce that the DB subnet never talks to the internet, or explicitly deny a rogue range even if an SG rule were mistakenly added. Many teams run NACLs in "allow all" (default) and only tighten them for compliance or incident response.

Explicit deny is only possible with NACLs. A SG can never explicitly deny a specific IP — you can only omit the allow. If you need to block a specific attack source IP at the network layer without touching every SG in the account, add a numbered NACL DENY rule above your existing allow rules.

Evaluating Rules: The Traffic Flow

For a packet traveling from the internet to an RDS instance, the evaluation order is:

IGW forwards the packet into the VPC.
NACL-Public (inbound) — is TCP/443 allowed inbound to the public subnet? Yes → continue.
sg-alb (inbound) — is TCP/443 allowed for this ENI? Yes → ALB receives the packet.
ALB terminates TLS, opens a new TCP connection to the app tier.
NACL-Public (outbound) — is TCP/8080 allowed outbound from the public subnet? (stateless — must be explicit) → Yes → continue.
NACL-Private (inbound) — is TCP/8080 from 10.0.1.0/24 allowed inbound? Yes → continue.
sg-app (inbound) — is TCP/8080 from sg-alb allowed? Yes → app receives the request.

Each direction at each boundary is an independent check. Miscounting one of these steps is why "I opened the port but it still does not work" is the most common support ticket on AWS.

Production Best Practices

Name every SG with a clear convention: sg-{env}-{tier}, e.g. sg-prod-app. Unnamed SGs are a compliance nightmare.
Never use 0.0.0.0/0 in inbound SG rules except for public-facing load balancers (port 443/80 only).
Enable VPC Flow Logs (action: ALL) on every production VPC. Flow logs record which NACL/SG decisions were made and are essential for forensics. Ship them to CloudWatch Logs or S3, then query with Athena.
Use AWS Network Firewall or Gateway Load Balancer with a third-party IDS when you need deep-packet inspection or geo-blocking beyond what NACLs support.
Periodically run aws ec2 describe-security-groups --filters Name=ip-permission.cidr,Values=0.0.0.0/0 in every region to enumerate overly permissive SGs.

# Audit: find all SGs with 0.0.0.0/0 inbound on any port
aws ec2 describe-security-groups \
  --query "SecurityGroups[?IpPermissions[?IpRanges[?CidrIp=='0.0.0.0/0']]].{ID:GroupId,Name:GroupName,VPC:VpcId}" \
  --output table