AWS Networking & Identity

Elastic Load Balancing Deep Dive

18 min Lesson 4 of 28

Elastic Load Balancing Deep Dive

AWS Elastic Load Balancing (ELB) is the traffic front door for nearly every production system on AWS. It absorbs client connections, performs health checks, and distributes requests across a fleet of targets — all without you managing a single load balancer instance. At big-tech scale, ELB is not just a convenience: it is the component that enables zero-downtime deployments, absorbs traffic spikes, and provides the first line of TLS termination. Understanding its internals is essential for any serious AWS practitioner.

ALB vs NLB: Choosing the Right Tool

AWS offers two primary load balancer types. The Application Load Balancer (ALB) operates at OSI Layer 7 (HTTP/HTTPS/gRPC). It reads the full HTTP request — path, headers, hostname, query strings — and routes based on content. The Network Load Balancer (NLB) operates at Layer 4 (TCP/UDP/TLS). It routes based on IP and port, with no awareness of application-layer content.

Use ALB when you need host-based or path-based routing, WebSocket support, gRPC, sticky sessions, WAF integration, Cognito authentication, or Lambda targets. This is the default for web applications and microservices.
Use NLB when you need ultra-low latency (microsecond-level connection passthrough), static IP addresses (NLB provides one Elastic IP per AZ), TCP pass-through for TLS mutual auth end-to-end, or non-HTTP protocols like MQTT, custom TCP, or high-volume UDP (e.g., DNS, gaming).

Key difference — connection handling: ALB always terminates the TCP connection at the load balancer and opens a new connection to the target. NLB in TCP mode passes the connection through to the target; the target sees the client's real source IP natively (no X-Forwarded-For header tricks required). For ALB, enable Proxy Protocol v2 or read the X-Forwarded-For header on the target to get the real client IP.

Target Groups

A target group is the destination pool that a listener rule routes traffic to. Targets can be EC2 instances, ECS tasks (by IP), Lambda functions, or other load balancers (ALB-behind-NLB pattern). Each target group has an independent health check configuration: protocol, path, port, healthy threshold, unhealthy threshold, and interval.

Target group attributes that matter in production:

Deregistration delay (default 300 s): how long ELB keeps sending in-flight requests to a target being deregistered. During rolling deployments, lower this to 30–60 s to speed up draining if your requests are short-lived.
Slow-start mode (ALB only): ramp a new target from 0% to full weight over 30–900 s — prevents cold-start JVM or Node.js instances from being avalanched immediately.
Load balancing algorithm (ALB only): Round robin (default), Least outstanding requests (better for heterogeneous request durations), or Weighted random with Least Outstanding Requests (new, best for large fleets).
Stickiness: duration-based (ELB-generated cookie) or application-based (your own cookie). Avoid stickiness unless the application truly requires it — it defeats the purpose of horizontal scaling.

Listeners and Rules

A listener is a port-and-protocol endpoint on the load balancer (e.g., HTTPS:443). It evaluates an ordered list of rules. Each rule has conditions (host header, path pattern, HTTP method, source IP, query string, HTTP headers) and an action (forward to target group, redirect, return a fixed response, authenticate via Cognito/OIDC). The default rule catches everything not matched by earlier rules.

A typical ALB rule setup for a microservices API gateway pattern:

aws elbv2 create-rule \
  --listener-arn arn:aws:elasticloadbalancing:us-east-1:123456789012:listener/app/prod-alb/abc123/def456 \
  --priority 10 \
  --conditions '[
    {"Field":"path-pattern","Values":["/api/v1/orders/*"]},
    {"Field":"http-header","HttpHeaderConfig":{"HttpHeaderName":"X-Service","Values":["orders"]}}
  ]' \
  --actions '[
    {"Type":"forward","TargetGroupArn":"arn:aws:elasticloadbalancing:us-east-1:123456789012:targetgroup/orders-svc/abc"}
  ]'

For HTTPS listeners you must attach an ACM (AWS Certificate Manager) certificate. ALB supports multiple certificates on one listener via SNI — the listener selects the certificate matching the Host header, so a single ALB can serve dozens of domains.

TLS Termination

ELB terminates TLS at the load balancer by default. The certificate lives in ACM; you never manage private keys on instances. Connections from ALB to targets travel over your VPC private network. For compliance environments (PCI-DSS, HIPAA) that require encryption all the way to the target, you can:

Install a certificate on the target and configure the target group protocol as HTTPS (ALB re-encrypts), or
Use an NLB with TLS listener and forward TCP pass-through to the target (end-to-end encryption, the NLB does not decrypt).

TLS security policy selection matters. Always prefer ELBSecurityPolicy-TLS13-1-2-2021-06 (TLS 1.3 + TLS 1.2, strong ciphers only) for internet-facing ALBs. Avoid the older ELBSecurityPolicy-2016-08 which permits TLS 1.0/1.1 — PCI DSS and SOC 2 auditors will flag it.

# Terraform: ALB with HTTPS listener, ACM cert, strict TLS policy
resource "aws_lb" "app" {
  name               = "prod-app-alb"
  internal           = false
  load_balancer_type = "application"
  subnets            = var.public_subnet_ids
  security_groups    = [aws_security_group.alb_sg.id]

  enable_deletion_protection = true
  drop_invalid_header_fields = true   # security best-practice

  access_logs {
    bucket  = aws_s3_bucket.alb_logs.id
    prefix  = "prod-alb"
    enabled = true
  }
}

resource "aws_lb_listener" "https" {
  load_balancer_arn = aws_lb.app.arn
  port              = 443
  protocol          = "HTTPS"
  ssl_policy        = "ELBSecurityPolicy-TLS13-1-2-2021-06"
  certificate_arn   = aws_acm_certificate_validation.app.certificate_arn

  default_action {
    type             = "forward"
    target_group_arn = aws_lb_target_group.app.arn
  }
}

resource "aws_lb_listener" "http_redirect" {
  load_balancer_arn = aws_lb.app.arn
  port              = 80
  protocol          = "HTTP"

  default_action {
    type = "redirect"
    redirect {
      port        = "443"
      protocol    = "HTTPS"
      status_code = "HTTP_301"
    }
  }
}

resource "aws_lb_target_group" "app" {
  name        = "prod-app-tg"
  port        = 8080
  protocol    = "HTTP"
  vpc_id      = var.vpc_id
  target_type = "ip"   # ECS Fargate / pod IPs

  deregistration_delay = 60

  health_check {
    path                = "/healthz"
    protocol            = "HTTP"
    interval            = 15
    healthy_threshold   = 2
    unhealthy_threshold = 3
    matcher             = "200"
  }
}

ALB listener rules route requests to target groups; TLS terminates at the ALB using an ACM certificate.

Health Checks and Failure Modes

ELB health checks are the mechanism that keeps traffic away from broken targets. A target is marked unhealthy after unhealthyThreshold consecutive failures and healthy again only after healthyThreshold consecutive successes. Common production pitfalls:

Health check path returning 200 while the app is broken: a shallow /ping that always returns 200 will keep an unhealthy target in rotation. Use a deep health check endpoint (/healthz) that verifies DB connectivity, cache reachability, and any critical dependency.
Security group blocking ELB health checks: ALB health checks originate from the ALB nodes themselves within your VPC. The target's security group must allow inbound traffic on the health-check port from the ALB security group (not from the internet).
Deregistration delay too high: the default 300 s means a rolling deployment waits 5 minutes per batch just draining connections. Tune it to match your p99 request duration.

Production tip — ALB access logs: Always enable access logs to S3 on production ALBs. They capture client IP, request URI, response code, latency, target IP, and TLS cipher — invaluable during incidents. The log volume is high (use S3 Intelligent-Tiering + a lifecycle rule to Glacier after 30 days to manage cost). Set drop_invalid_header_fields = true in Terraform to prevent HTTP desync (request-smuggling) attacks.

NLB + Security Groups: NLBs did not support security groups until 2023. If you have an older NLB, target security groups must allow traffic from the NLB's Elastic IPs and from the client CIDR directly — the NLB is transparent at Layer 4. Newer NLBs support security groups, which is the preferred configuration; enable it explicitly.

Cross-Zone Load Balancing

By default, each ALB node (one per AZ) only distributes requests to targets registered in its own AZ. With cross-zone load balancing enabled (the default for ALB, optional for NLB), each node distributes requests evenly across all registered targets in all AZs. This eliminates the need to maintain an equal number of targets per AZ — critical when Auto Scaling groups span AZs and instance counts are uneven. NLB charges for cross-AZ data transfer when cross-zone load balancing is enabled; ALB does not.

Mastering ELB — especially the interplay between listener rules, target group health checks, deregistration delays, and TLS policy selection — is what separates engineers who "just get it working" from engineers who build systems that survive real production failures.