AWS Networking & Identity

Route 53 & DNS Strategies

18 min Lesson 5 of 28

Route 53 & DNS Strategies

Amazon Route 53 is AWS's authoritative DNS service and global traffic management layer. Unlike a simple DNS resolver, Route 53 lets you embed routing intelligence directly into DNS responses — shifting traffic based on latency, geographic origin, endpoint health, or arbitrary weights. At big-tech scale, Route 53 is the first line of defense for availability: it keeps traffic away from unhealthy endpoints before a single packet reaches your infrastructure. Understanding its architecture and routing policies is essential for any engineer designing a production system on AWS.

Hosted Zones: The DNS Namespace Boundary

A hosted zone is Route 53's container for DNS records for a domain. It maps directly to a DNS zone in the RFC sense. There are two types:

Public hosted zone — serves DNS responses to any resolver on the public internet. When you register a domain (or delegate an existing one to Route 53), you create a public hosted zone and Route 53 provisions four name server (NS) records across its Anycast network for high availability. You then add A, CNAME, MX, TXT, and other records inside the zone.
Private hosted zone — visible only within one or more Amazon VPCs you associate it with. Used for internal service discovery: payments.internal, db-primary.prod.internal. Resolvers outside the associated VPCs receive NXDOMAIN. You can associate the same private hosted zone with VPCs in different AWS accounts for cross-account service discovery.

Route 53 charges per hosted zone per month ($0.50) and per million DNS queries. At production query volumes (billions of queries per month for large platforms), DNS costs can become significant. Consolidate internal zones where practical — one prod.internal zone is cheaper and easier to manage than dozens of per-service zones.

Alias Records: The AWS-Native A Record

Before diving into routing policies, understand Alias records — Route 53's proprietary extension to standard DNS. An Alias record points to an AWS resource (ALB, CloudFront distribution, S3 website endpoint, another Route 53 record in the same zone) and behaves like an A record at the DNS level but with two key advantages:

No CNAME at zone apex — the DNS specification forbids CNAME records at the zone apex (example.com, not www.example.com). Alias records bypass this restriction, letting you point example.com directly at an ALB or CloudFront distribution.
Free queries — DNS queries to Alias records that resolve to AWS resources are free. At high query volumes this is a meaningful saving.

Always use Alias records (not CNAME) when pointing to AWS resources. It is both cheaper and more correct at the zone apex.

Routing Policies: Intelligence in DNS

Route 53 supports seven routing policies. The four you will use most in production are simple, weighted, latency-based, and failover. Each policy changes how Route 53 selects which value to return when a resolver queries a given record name and type.

The four main Route 53 routing policies and how health checks integrate with failover and weighted routing.

Weighted Routing

Weighted routing assigns a numeric weight to each record set with the same name and type. Route 53 returns each record proportionally to its weight relative to the total. Weight 0 removes a record from rotation without deleting it — the canonical way to drain an endpoint before a deployment or maintenance window.

Production use cases: canary deployments (send 5% of traffic to new version, watch error rates, then shift to 100%), A/B testing, and gradual region onboarding (bring up a new region at weight 1, verify, then raise to par).

# Create two weighted A records for a canary deployment
# "stable" version: weight 95
aws route53 change-resource-record-sets \
  --hosted-zone-id Z1234567890EXAMPLE \
  --change-batch '{
    "Changes": [
      {
        "Action": "UPSERT",
        "ResourceRecordSet": {
          "Name": "api.example.com",
          "Type": "A",
          "SetIdentifier": "stable-v1",
          "Weight": 95,
          "AliasTarget": {
            "HostedZoneId": "Z35SXDOTRQ7X7K",
            "DNSName": "alb-stable-123.us-east-1.elb.amazonaws.com.",
            "EvaluateTargetHealth": true
          }
        }
      },
      {
        "Action": "UPSERT",
        "ResourceRecordSet": {
          "Name": "api.example.com",
          "Type": "A",
          "SetIdentifier": "canary-v2",
          "Weight": 5,
          "AliasTarget": {
            "HostedZoneId": "Z35SXDOTRQ7X7K",
            "DNSName": "alb-canary-456.us-east-1.elb.amazonaws.com.",
            "EvaluateTargetHealth": true
          }
        }
      }
    ]
  }'

Latency-Based Routing

Latency-based routing directs each query to the AWS region with the lowest measured network latency from the resolver's perspective. AWS maintains a latency table between its edge network and regions; Route 53 consults this table at query time. You create one record per region with the same name and type, each pointing to that region's endpoint.

This is the default choice for multi-region active-active architectures. A user in Tokyo gets routed to ap-northeast-1, a user in London to eu-west-1, without any manual geo-mapping. Latency measurements update continuously as AWS network conditions change.

Set EvaluateTargetHealth: true on latency Alias records pointing to ALBs. If the lowest-latency region becomes unhealthy (all ALB targets unhealthy), Route 53 automatically falls back to the next-lowest-latency healthy region. This gives you automatic multi-region failover at zero extra engineering cost.

Failover Routing

Failover routing implements an explicit primary/secondary topology. You designate one record as PRIMARY and one as SECONDARY. Route 53 returns the PRIMARY record as long as its associated health check passes. When the health check fails, Route 53 automatically returns the SECONDARY record. DNS TTL ensures propagation within seconds to minutes.

This maps directly to an active-passive disaster recovery design. The secondary can be an S3 static site (a "sorry, maintenance" page), a read replica promoted to writer, or a full hot-standby region. Combine with Route 53 Application Recovery Controller for automated multi-region failover orchestration at enterprise scale.

Health Checks: The Sentinel Layer

Health checks are Route 53's mechanism for removing unhealthy endpoints from DNS responses. They run independently of the DNS service — dedicated Route 53 health checkers worldwide poll your endpoints on a configurable interval (10 or 30 seconds) and threshold. An endpoint is declared unhealthy when the configured number of consecutive checks fail.

Three types:

Endpoint health checks — Route 53 sends HTTP, HTTPS, or TCP probes to a specified IP or domain. For HTTP/HTTPS, you can require a specific response code and string match in the body (first 5,120 bytes).
Calculated health checks — a logical combination (AND/OR) of other health checks. Useful to declare an endpoint healthy only when multiple dependencies are healthy simultaneously (database reachable AND cache warm AND circuit breaker open).
CloudWatch alarm health checks — the health check mirrors the state of a CloudWatch alarm. This lets you express complex health logic (p99 latency above threshold, error rate above 1%, queue depth above limit) without building a dedicated probe endpoint.

# Create an HTTPS health check with string matching
aws route53 create-health-check \
  --caller-reference "api-hc-$(date +%s)" \
  --health-check-config '{
    "Type": "HTTPS",
    "FullyQualifiedDomainName": "api.example.com",
    "Port": 443,
    "ResourcePath": "/health",
    "RequestInterval": 10,
    "FailureThreshold": 3,
    "SearchString": "\"status\":\"ok\"",
    "EnableSNI": true,
    "Regions": ["us-east-1", "eu-west-1", "ap-northeast-1"]
  }'

# Associate the health check ID with a DNS record (add HealthCheckId field)
# Then check current health status
aws route53 get-health-check-status \
  --health-check-id a1b2c3d4-e5f6-7890-abcd-ef1234567890

A common production misconfiguration: attaching a health check to an internal-only endpoint that Route 53 health checkers cannot reach. Route 53 health checkers originate from public IP ranges — they cannot reach private VPC endpoints or resources behind a security group that blocks external traffic. For private endpoints, use a CloudWatch alarm health check instead: your application emits a metric (or a synthetic Lambda canary runs inside the VPC), and the alarm drives the health check state.

Routing Policy Combinations and Traffic Policies

Real production DNS architectures chain multiple routing policies using Route 53 Traffic Policies (the visual traffic flow editor) or by nesting record sets. A common three-tier pattern:

Geolocation outer layer — EU users resolve to an EU-scoped record; US users to a US-scoped record (satisfying GDPR data residency).
Latency middle layer — within each geo, route to the lowest-latency available region.
Failover inner layer — within each region, primary is the live ALB; secondary is a degraded-mode endpoint or a static S3 page.

This nested architecture handles the full range of failure scenarios: regional outage, cross-region degradation, and total DR, all through DNS — no changes to application code or load balancer config required during incidents.

DNS TTL values are a critical operational lever. Route 53 lets you set TTL as low as 0 for non-Alias records (Alias records always return 60 seconds from the AWS side). Before a planned maintenance or deployment, lower the TTL to 60 seconds, wait for one TTL period for caches to drain, then perform the cutover. Afterward raise the TTL back to 300 or higher to reduce resolver load and query costs. Keeping TTL permanently at 60 seconds is operationally convenient but wastes money at scale.

Private DNS and VPC Resolver

For internal microservices, use a private hosted zone associated with your VPC. The VPC's built-in resolver (available at the base of your VPC CIDR plus two, e.g. 10.0.0.2) handles both internal private zone lookups and forwards everything else to the public DNS hierarchy. Route 53 Resolver Endpoints let you extend this: an inbound endpoint accepts DNS queries from your on-premises network over Direct Connect or VPN (so on-prem resolvers can resolve AWS private zones), and an outbound endpoint with resolver rules forwards queries for specific on-prem domains back to on-prem DNS servers.