Networking Essentials for DevOps

NAT, Proxies & Gateways

18 min Lesson 8 of 30

NAT, Proxies & Gateways

Every production cloud environment hides its internal address space from the public internet. Servers in a private subnet have no routable IPs, yet they can reach npm registries, pull container images from Docker Hub, and accept HTTPS traffic from millions of users. Three mechanisms make that possible: Network Address Translation (NAT), proxies, and gateways. This lesson dissects all three at the packet level and connects them to patterns you will configure daily as a DevOps engineer.

Network Address Translation (NAT)

NAT rewrites IP addresses (and TCP/UDP port numbers) in packet headers as they pass through a router or firewall. There are two production-critical variants.

SNAT — Source NAT

Source NAT replaces the source IP in outbound packets. Your private instance at 10.0.1.42 wants to reach 54.230.10.1. Before the packet leaves your VPC, the NAT device rewrites the source to its own public IP (say 52.1.2.3) and records the mapping in a connection tracking table. When the reply arrives at 52.1.2.3, the device looks up the table, rewrites the destination back to 10.0.1.42, and forwards the packet inward. The external server sees only the public IP — it never learns the private address exists.

In AWS, the NAT Gateway performs SNAT for private subnets. In Linux, iptables does the same with a single rule:

# SNAT all traffic leaving eth0 to the interface's own public IP iptables -t nat -A POSTROUTING -o eth0 -j MASQUERADE # Or pin to a specific IP (more predictable; prefer this in prod) iptables -t nat -A POSTROUTING -o eth0 -j SNAT --to-source 52.1.2.3 # Inspect the connection-tracking table conntrack -L -p tcp --dport 443

MASQUERADE is convenient for dynamic IPs (home routers, spot instances) because it reads the current interface address automatically. For production NAT Gateways with fixed Elastic IPs, always use SNAT --to-source so behavior is deterministic after reboots.

DNAT — Destination NAT

Destination NAT replaces the destination IP. A packet arrives at your public IP on port 443, and the firewall rewrites the destination to an internal server at 10.0.2.80:8443 before forwarding. This is how bare-metal load balancers and port-forwarding rules work — and it is exactly what happens inside every cloud Network Load Balancer at the packet level.

# DNAT: forward public port 443 to internal server iptables -t nat -A PREROUTING -i eth0 -p tcp --dport 443 \ -j DNAT --to-destination 10.0.2.80:8443 # Allow the forwarded traffic through the FORWARD chain iptables -A FORWARD -p tcp -d 10.0.2.80 --dport 8443 -j ACCEPT # Enable IP forwarding (must be on) sysctl -w net.ipv4.ip_forward=1
Stateful tracking is the key. NAT only works because the kernel maintains a connection tracking (conntrack) table that ties outbound flows to their inbound replies. Each entry holds: protocol, source IP:port, translated IP:port, destination IP:port, and state (SYN_SENT, ESTABLISHED, TIME_WAIT…). Entries expire automatically, but high-traffic systems can exhaust the conntrack table — check nf_conntrack_count vs nf_conntrack_max before a production incident finds it for you.
SNAT and DNAT egress and ingress flows Private VPC / LAN Public Internet NAT / FW conntrack table Instance A 10.0.1.42 Instance B 10.0.2.80 External Server 54.230.10.1 Client 203.0.113.5 src: 10.0.1.42 src: 52.1.2.3 (SNAT) dst: 52.1.2.3:443 dst: 10.0.2.80 (DNAT) Public IP: 52.1.2.3
SNAT rewrites source addresses on egress; DNAT rewrites destination addresses on ingress. Both use the same conntrack table to reverse the translation on replies.

Forward Proxies

A forward proxy sits between internal clients and the internet. Clients are configured to route all outbound requests through it. The proxy makes the actual connection on their behalf, returning the response. From the origin server's perspective, the request came from the proxy.

Forward proxies appear in large organisations for three reasons: egress control (only allow specific destinations), TLS inspection (decrypt, inspect for DLP, re-encrypt), and caching (reduce bandwidth costs for package registries). In cloud environments, they replace NAT Gateways when you need deep packet inspection or fine-grained URL-level policies.

# Squid forward proxy — /etc/squid/squid.conf (key directives) http_port 3128 acl localnet src 10.0.0.0/8 acl SSL_ports port 443 acl CONNECT method CONNECT # Allow internal clients http_access allow localnet # Whitelist only specific domains (allowlist egress pattern) acl allowed_domains dstdomain .amazonaws.com .docker.io .npmjs.com http_access allow allowed_domains http_access deny all # On client hosts — export env vars (picked up by curl, apt, pip, etc.) export http_proxy=http://10.0.3.10:3128 export https_proxy=http://10.0.3.10:3128 export no_proxy=169.254.169.254,10.0.0.0/8
Production egress pattern: in regulated environments (PCI-DSS, HIPAA, SOC 2), all egress runs through a forward proxy with a deny-all, allowlist-exceptions policy. This ensures a compromised workload cannot exfiltrate data to attacker-controlled infrastructure — it can only reach pre-approved destinations. The no_proxy variable must always exclude the instance metadata IP (169.254.169.254) or your EC2/GCE instances will lose their IAM credentials.

Reverse Proxies

A reverse proxy sits in front of your backend servers. Clients talk to the proxy, believing it is the origin. The proxy forwards requests to one or more backends, collects the response, and sends it back. Clients never learn the backend addresses.

Nginx is the canonical reverse proxy in DevOps. A minimal HTTPS reverse proxy config:

# /etc/nginx/sites-available/api.example.com upstream backend { server 10.0.2.10:8080; server 10.0.2.11:8080; keepalive 32; # Reuse TCP connections to backends } server { listen 443 ssl; server_name api.example.com; ssl_certificate /etc/letsencrypt/live/api.example.com/fullchain.pem; ssl_certificate_key /etc/letsencrypt/live/api.example.com/privkey.pem; location / { proxy_pass http://backend; proxy_set_header Host $host; proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; proxy_set_header X-Forwarded-Proto $scheme; proxy_read_timeout 60s; } }

The X-Forwarded-For header is critical: it carries the original client IP through the proxy chain so your application logs show real IPs, not the proxy's address. Without it, every request appears to originate from the proxy.

Gateways: API Gateway & NAT Gateway

The word "gateway" is overloaded in cloud networking. Two types matter most:

  • Internet Gateway (IGW) — a 1:1 NAT between a public subnet's private IP and its Elastic IP. It enables instances with public IPs to be directly reachable. No bandwidth limits, no port translation.
  • NAT Gateway — managed SNAT for private subnets. Instances in private subnets route their default route (0.0.0.0/0) to the NAT Gateway, which SNATs outbound traffic to the gateway's EIP. Replies return and the gateway DNATs them back. Fully managed, scales to 100 Gbps automatically.
  • API Gateway — operates at Layer 7. It terminates HTTP/HTTPS, enforces authentication, rate-limits requests, and routes to downstream services (Lambda, EC2, containers). Functionally a reverse proxy plus auth layer plus request transformation.

Egress Patterns in Production

Large platforms combine all these pieces into tiered egress architectures. Private workloads never have public IPs. All outbound traffic flows through a centralised egress VPC (sometimes called a "transit VPC" or "egress VPC") that runs a managed NAT Gateway plus an IDS/IPS forward proxy. A single EIP per availability zone serves as the stable source address for all production traffic — third-party vendors whitelist those IPs in their firewalls.

NAT Gateway conntrack exhaustion is a real production failure mode. Each NAT Gateway can track up to 1,000,000 concurrent connections. Lambda functions that fan out thousands of simultaneous outbound connections can hit this limit, causing new connection attempts to be silently dropped. Monitor CloudWatch: NatGateway ConnectionAttemptCount vs ConnectionEstablishedCount. The gap between them is your drop rate. Solution: spread Lambdas across multiple NAT Gateways in different AZs, or switch to a VPC Endpoint for AWS service traffic (no NAT needed at all).

Understanding where each mechanism operates — SNAT at the packet level, a forward proxy at the HTTP application level, a reverse proxy terminating TLS, an API Gateway enforcing auth — lets you choose the right tool and debug failures precisely. When a backend stops receiving traffic, the first question is: which layer broke? NAT conntrack full, proxy upstream unhealthy, gateway route missing, or certificate expired?