Cloud & Kubernetes Security Hardening

Network Security in the Cloud

18 min Lesson 4 of 28

Network Security in the Cloud

In traditional data centers, the network perimeter was a physical firewall at the building edge. In the cloud, that model collapses: every service can reach every other service by default unless you explicitly prevent it. Network security in the cloud is therefore a deliberate, policy-driven act — you must segment, constrain, and monitor every traffic path, or attackers who compromise one pod or function will pivot freely across your entire estate.

This lesson covers the three pillars of cloud network security: segmentation (who may talk to whom), private endpoints (keeping data paths off the public internet), and egress control (preventing outbound exfiltration or C2 callbacks). We close with a production zero-egress architecture diagram.

Segmentation: Micro-Perimeters at Every Layer

Segmentation means dividing your network into zones where each zone has the minimum connectivity it needs. In AWS, the building blocks are VPCs, subnets, Security Groups (SGs), and Network ACLs (NACLs). In Kubernetes, the equivalent is Network Policies enforced by a CNI plugin (Cilium, Calico, or AWS VPC CNI with Kubernetes Network Policy support).

The segmentation hierarchy in a mature AWS environment looks like this:

Account-level isolation — production, staging, and tooling live in separate AWS accounts under an AWS Organization. Cross-account traffic routes through Transit Gateway or PrivateLink, never the public internet.
VPC segmentation — each environment gets its own VPC. Peering is explicit and route-table-scoped; full-mesh peering is a sign of missing design.
Subnet segmentation — public (load balancers), private application (compute), and isolated data (RDS, ElastiCache, OpenSearch) subnets. Data subnets have no route to the internet, period.
Security Group rules — the instance-level micro-firewall. Stateful, default-deny. Reference SG IDs rather than CIDRs for internal traffic so rules don't break when IPs rotate.
Kubernetes Network Policies — pod-level firewall inside the cluster.

Here is a production-grade Kubernetes Network Policy that gives a payments pod only the ingress it needs — traffic from the api-gateway namespace on port 8443 and from Prometheus for scraping:

# network-policy-payments.yaml
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: payments-ingress
  namespace: payments
spec:
  podSelector:
    matchLabels:
      app: payments-service
  policyTypes:
    - Ingress
    - Egress
  ingress:
    - from:
        - namespaceSelector:
            matchLabels:
              kubernetes.io/metadata.name: api-gateway
          podSelector:
            matchLabels:
              app: api-gateway
      ports:
        - protocol: TCP
          port: 8443
    - from:
        - namespaceSelector:
            matchLabels:
              kubernetes.io/metadata.name: monitoring
          podSelector:
            matchLabels:
              app: prometheus
      ports:
        - protocol: TCP
          port: 9090
  egress:
    - to:
        - namespaceSelector:
            matchLabels:
              kubernetes.io/metadata.name: payments-db
      ports:
        - protocol: TCP
          port: 5432
    - to:                         # kube-dns
        - namespaceSelector: {}
          podSelector:
            matchLabels:
              k8s-app: kube-dns
      ports:
        - protocol: UDP
          port: 53

Default-deny is the goal. A NetworkPolicy only takes effect for pods selected by podSelector. A namespace with no NetworkPolicy is fully open. Always start by deploying a default-deny-all policy and opening only what you need.

Private Endpoints: Keeping Data Off the Public Internet

When an EC2 instance calls the S3 API, by default that traffic leaves your VPC, traverses the public internet, hits an S3 public IP, and returns. This means your data crosses infrastructure you don't control, your egress bandwidth is metered, and you need to open outbound HTTPS in your Security Group to 0.0.0.0/0 — the worst possible rule.

VPC Endpoints (AWS) solve this. A Gateway Endpoint keeps S3 and DynamoDB traffic inside the AWS backbone, free of charge. An Interface Endpoint (PrivateLink) provisions an ENI in your subnet for almost every other AWS service — SQS, SSM, ECR, Secrets Manager, RDS Data API, CloudWatch Logs, and more. The traffic never leaves the Amazon network, latency drops, and you can lock down Security Groups to deny outbound 443 to the internet entirely.

# Terraform: Interface endpoint for Secrets Manager + ECR (required by private EKS nodes)
resource "aws_vpc_endpoint" "secretsmanager" {
  vpc_id              = aws_vpc.main.id
  service_name        = "com.amazonaws.${var.region}.secretsmanager"
  vpc_endpoint_type   = "Interface"
  subnet_ids          = aws_subnet.private[*].id
  security_group_ids  = [aws_security_group.vpce.id]
  private_dns_enabled = true   # overrides public DNS — zero app-code changes needed

  tags = { Name = "secretsmanager-vpce" }
}

resource "aws_vpc_endpoint" "ecr_api" {
  vpc_id              = aws_vpc.main.id
  service_name        = "com.amazonaws.${var.region}.ecr.api"
  vpc_endpoint_type   = "Interface"
  subnet_ids          = aws_subnet.private[*].id
  security_group_ids  = [aws_security_group.vpce.id]
  private_dns_enabled = true
}

resource "aws_vpc_endpoint" "ecr_dkr" {
  vpc_id              = aws_vpc.main.id
  service_name        = "com.amazonaws.${var.region}.ecr.dkr"
  vpc_endpoint_type   = "Interface"
  subnet_ids          = aws_subnet.private[*].id
  security_group_ids  = [aws_security_group.vpce.id]
  private_dns_enabled = true
}

# The VPC endpoint security group: allow 443 inbound from the cluster node SG only
resource "aws_security_group" "vpce" {
  name   = "vpce-sg"
  vpc_id = aws_vpc.main.id

  ingress {
    from_port       = 443
    to_port         = 443
    protocol        = "tcp"
    security_groups = [aws_security_group.eks_nodes.id]
  }
}

EKS fully-private cluster checklist. For EKS worker nodes with no internet access you need Interface Endpoints for: ecr.api, ecr.dkr, s3 (Gateway), ec2, sts, elasticloadbalancing, logs (CloudWatch), and ssm + ssmmessages if you use SSM Session Manager for node access. Missing any one of these will leave nodes in NotReady with cryptic pull errors.

Egress Control: The Last Line Before Exfiltration

Egress filtering is the most commonly skipped network control in cloud environments — and the one attackers rely on most. Ransomware staging, cryptominer C2 callbacks, and data exfiltration all require outbound connectivity. If you allow unrestricted outbound HTTPS from compute, a compromised container can silently call home for months.

Production egress architecture:

All outbound internet traffic is forced through a managed egress proxy (AWS: NAT Gateway + a DNS-based filtering layer such as AWS Network Firewall or a self-managed Squid cluster; GCP: Cloud NAT + Cloud Armor; Azure: Azure Firewall with FQDN rules).
The proxy enforces an allowlist of FQDNs — not IP CIDRs, since major cloud providers rotate IPs constantly. Only the specific hostnames your services actually need are permitted.
Kubernetes workloads that need no internet access get Network Policies that deny all egress except to cluster-internal services and VPC endpoints. Workloads that legitimately need internet access are isolated in a dedicated node pool / namespace with extra audit logging.

Zero-egress architecture: pods use VPC endpoints for AWS services; only the designated egress proxy may reach external FQDNs; blocked paths are shown in red.

Implementing the Zero-Egress Pattern

Here is the AWS Network Firewall Terraform rule group that enforces the FQDN allowlist at the VPC level. This is a stateful rule group — it inspects SNI in TLS handshakes, so it works for HTTPS without decryption:

resource "aws_networkfirewall_rule_group" "egress_allowlist" {
  capacity = 100
  name     = "egress-fqdn-allowlist"
  type     = "STATEFUL"

  rule_group {
    rules_source {
      rules_source_list {
        generated_rules_type = "ALLOWLIST"
        target_types         = ["TLS_SNI", "HTTP_HOST"]
        targets = [
          "api.stripe.com",
          "hooks.slack.com",
          "o1234567.ingest.sentry.io",
          "updates.example-vendor.com",
        ]
      }
    }
    stateful_rule_options {
      rule_order = "STRICT_ORDER"
    }
  }
}

resource "aws_networkfirewall_policy" "egress" {
  name = "egress-policy"
  firewall_policy {
    stateless_default_actions          = ["aws:forward_to_sfe"]
    stateless_fragment_default_actions = ["aws:forward_to_sfe"]
    stateful_rule_group_reference {
      resource_arn = aws_networkfirewall_rule_group.egress_allowlist.arn
      priority     = 1
    }
    # Drop everything not explicitly allowed
    stateful_default_actions = ["aws:drop_established", "aws:alert_established"]
  }
}

NAT Gateway is not egress control. A NAT Gateway lets everything out — it is a routing device, not a firewall. Organisations often believe they have egress control because they use a NAT Gateway. You need a stateful layer (Network Firewall, a proxy, or Security Group egress rules scoped to known CIDRs) in front of the NAT to actually filter traffic. Attackers know this and exploit it routinely.

Production Failure Modes

Missing kube-dns egress in Network Policy — pods cannot resolve service names. Always include UDP/53 egress to kube-dns in any custom policy or resolution silently fails.
Interface Endpoint in wrong AZ — if your endpoint ENI is only in us-east-1a but your pods run in us-east-1b, cross-AZ latency spikes and costs accumulate. Deploy endpoints in every AZ your nodes use.
FQDN allowlist not covering CDN hostnames — vendors often serve SDK updates from rotating CDN FQDNs (*.cloudfront.net, *.fastly.net). Wildcard rules on Network Firewall must be used deliberately and scoped as narrowly as possible.
Security Group egress 0.0.0.0/0 left open — the Terraform AWS provider creates this rule by default. Use create_before_destroy lifecycle blocks and explicit egress rules to remove it.

Audit your egress today. Run

aws ec2 describe-security-groups --filters Name=ip-permission.from-port,Values=443 Name=ip-permission.cidr,Values=0.0.0.0/0 --query 'SecurityGroups[*].GroupId'

to find every SG in your account that allows outbound HTTPS to the world. Each hit is a potential exfiltration path.