Cloud & Kubernetes Security Hardening

The Cloud Threat Landscape

18 min Lesson 1 of 28

The Cloud Threat Landscape

Every major cloud breach in the last decade shares a common pattern: an attacker did not break the encryption, defeat the firewall, or reverse-engineer the application. They found an S3 bucket set to public, a long-lived access key committed to GitHub, or an over-permissioned service account that let them walk from one workload straight into the production database. Cloud infrastructure is not fundamentally insecure — but its operational model is radically different from the on-premises world, and teams that carry on-premises mental models into the cloud create a surface area that is orders of magnitude larger than they realize.

This lesson maps the real attack surface. We will examine how misconfigurations are created and discovered, how credentials are stolen and abused, and how attackers move laterally through a cloud environment after initial access. Understanding the attack chain is prerequisite to every hardening control you will apply in the tutorials that follow.

The CNCF and CSA Threat Taxonomy

The Cloud Security Alliance's Egregious 11 and the CNCF's cloud native security whitepaper both converge on the same root causes. The top categories in production incidents at scale are:

Misconfiguration — resources deployed with insecure defaults (public storage, permissive IAM, open security groups, disabled audit logging).
Credential compromise — leaked API keys, long-lived static credentials, over-permissioned service accounts, and stolen OAuth tokens.
Insufficient identity controls — no MFA on privileged accounts, no least-privilege enforcement, no periodic access reviews.
Insecure APIs — unauthenticated metadata endpoints, unprotected control-plane APIs, missing TLS verification.
Lateral movement — the attacker's progression from a foothold (one compromised resource) to their objective (data exfiltration, ransomware, compute abuse for crypto-mining).

In practice these categories chain together. A misconfiguration grants initial access; a credential found during that access enables privilege escalation; lateral movement reaches the sensitive data. Hardening requires interrupting this chain at multiple points — defense in depth, not a single perimeter.

How Breaches Actually Start: Misconfiguration

Misconfiguration is the leading cause of cloud breaches. The 2019 Capital One breach — 100 million customer records — began with a misconfigured AWS WAF that allowed SSRF (Server-Side Request Forgery). The attacker used SSRF to reach the EC2 Instance Metadata Service (IMDS) and retrieve temporary IAM credentials bound to an over-permissioned role. Those credentials were then used to list and download S3 buckets containing financial data.

Common misconfiguration categories in production:

Storage buckets exposed to the internet — S3, GCS, and Azure Blob storage all default to private in recent years, but older Terraform modules, console-created buckets, and ACL overrides on bucket policies regularly produce public objects. A single s3:GetObject permission on * in a bucket policy is sufficient for mass exfiltration.
Security group rules with 0.0.0.0/0 ingress — SSH (22), RDP (3389), and database ports (3306, 5432) exposed to the public internet are discovered by automated scanners within minutes of deployment. Shodan and Censys index these continuously.
IMDSv1 enabled — the original EC2 metadata service (v1) does not require a session token. Any process on the instance — including SSRF payloads received by a web application — can request http://169.254.169.254/latest/meta-data/iam/security-credentials/ and receive valid AWS credentials without authentication. IMDSv2 (session-oriented, PUT-based) was introduced in 2019 precisely because of the Capital One pattern. In 2025, IMDSv2 is not yet the default for all instance types and launch templates — you must enforce it explicitly.
Disabled audit logging — CloudTrail not enabled in all regions, GCP Audit Logs missing data-access events, or Kubernetes API audit logging turned off. Without logs, incident response is blind.

Production pitfall: Terraform and CloudFormation create resources from code, but that code is often written once and drifts from the actual state over time. Engineers make one-off changes in the console, add resources manually, or modify security groups to fix an urgent issue without updating IaC. The result is infrastructure whose actual state no longer matches its declared state. Always treat terraform plan drift as a security event, not just an ops inconvenience.

Detecting misconfigurations before attackers do requires continuous scanning. The following AWS CLI command audits S3 bucket public access settings across your account — a first step any team can run immediately:

# List all buckets and check their public-access-block configurations
aws s3api list-buckets --query 'Buckets[*].Name' --output text \
  | tr '\t' '\n' \
  | xargs -I{} sh -c \
    'aws s3api get-public-access-block --bucket {} 2>&1 || echo "UNSET: {}"'

# Enforce account-level block on all new and existing buckets
aws s3control put-public-access-block \
  --account-id $(aws sts get-caller-identity --query Account --output text) \
  --public-access-block-configuration \
    BlockPublicAcls=true,IgnorePublicAcls=true,\
BlockPublicPolicy=true,RestrictPublicBuckets=true

Credential Theft: The Most Direct Path

A valid credential bypasses every network control, every WAF rule, and every intrusion detection signature. Attackers invest heavily in credential collection because it works. The primary vectors are:

Source code repositories — AWS access keys, GCP service account JSON files, and database connection strings committed to GitHub (public or private) are the most common single vector. GitHub secret scanning catches many patterns, but not all — and private repos are not immune to insider threats or token leaks that expose the repository itself.
CI/CD pipeline exposure — environment variables in GitHub Actions logs, unmasked secrets in build artifacts, and misconfigured ACTIONS_STEP_DEBUG=true settings that dump all environment variables to the log stream.
Long-lived static credentials — IAM access keys with no expiry. A key created for a developer who left the organization three years ago, never rotated, and still active is a fully valid permanent backdoor. AWS does not expire keys automatically.
Instance and pod metadata services — SSRF or SSRF-equivalent vulnerabilities in web applications that allow the attacker to fetch http://169.254.169.254 (AWS), http://metadata.google.internal (GCP), or http://169.254.169.254/metadata (Azure).
Phishing for federated credentials — SSO/SAML assertion theft, OAuth token hijacking via open redirect, or stealing short-lived credentials from a developer's local ~/.aws/credentials file.

Detect long-lived, unused IAM credentials with the IAM credential report — a native AWS capability every team should run on a weekly schedule:

# Generate the IAM credential report (data refreshes up to every 4 hours)
aws iam generate-credential-report

# Download and parse: show keys active but unused for over 90 days
aws iam get-credential-report --query 'Content' --output text \
  | base64 --decode \
  | awk -F',' 'NR==1 || ($9=="true" && $11 != "N/A" && $11 < "2025-03-01") \
    {printf "user=%-30s key_active=%-6s last_used=%s\n", $1, $9, $11}'

# Fields: username ($1), access_key_1_active ($9), access_key_1_last_used_date ($11)
# Any key active with last_used older than 90 days must be rotated or disabled

Lateral Movement: From Foothold to Crown Jewels

Initial access is rarely the attacker's goal. They want data, persistent access, or compute resources. Lateral movement is the process of expanding from a foothold to the target. In a cloud environment, the key enablers of lateral movement are:

Over-permissioned IAM roles — a compute service (EC2, Lambda, ECS task) with a role that has iam:PassRole, iam:CreateAccessKey, or sts:AssumeRole allows the attacker to escalate to any role in the account by creating new credentials or assuming more privileged identities.
Flat VPC networks — a VPC with all subnets able to reach all other subnets, no micro-segmentation, and no private endpoints forces traffic through the internet gateway unnecessarily and allows any compromised workload to initiate connections to every other workload.
Shared service accounts in Kubernetes — a pod running with a service account that has cluster-wide get/list/watch permissions on secrets can read every secret in the cluster from any namespace. One compromised pod becomes a key-extraction tool for the entire cluster.
Secrets in environment variables — environment variables are accessible to any process on the host, appear in /proc/[pid]/environ, are dumped in many error logs, and are visible in the Kubernetes API to anyone with pods/exec or describe pod permissions on the namespace.

The cloud attack chain: a misconfiguration or leaked credential grants initial access, credential abuse enables privilege escalation, and lateral movement reaches the attacker's objective.

A Realistic Attack Scenario

Here is how a real compromise unfolds in a mid-sized SaaS company running on AWS with Kubernetes (EKS):

A developer commits a .env file to a public GitHub repository during a late-night debugging session. The file contains an AWS access key for the staging environment. GitHub's secret scanner emails an alert — but the key has already been harvested by automated scanners that poll GitHub's public event stream in real time.
The attacker runs aws sts get-caller-identity with the stolen key and confirms it is valid. They run aws iam list-attached-user-policies and discover the key belongs to a developer user with the AWS-managed AdministratorAccess policy attached — a policy granted for "temporary" access months ago and never removed.
With admin access, the attacker creates a new IAM user with its own access key for persistence, then explores the environment. They find an EKS cluster, generate a kubeconfig with aws eks update-kubeconfig, and discover that the cluster's aws-auth ConfigMap maps the system:masters group to all authenticated AWS users in the account — a misconfiguration common in clusters bootstrapped with early EKS documentation.
As a Kubernetes cluster admin, the attacker runs kubectl get secrets --all-namespaces and retrieves database credentials, third-party API keys, and a Stripe secret key stored as Kubernetes Secrets in plaintext base64.
The attacker exfiltrates the secrets, uses the database credentials to dump the production PostgreSQL database, and deploys a DaemonSet on every node running a crypto-miner — all within 45 minutes of the initial credential compromise.

Key insight: Every step in the scenario above was enabled by a different control failure: committed secrets, no key rotation, over-privileged IAM, misconfigured EKS auth, and secrets stored as base64 rather than in a secrets manager. No single control failure is catastrophic in isolation — the breach required all of them in sequence. Defense in depth means ensuring that the attacker must overcome multiple independent barriers, not just one.

Threat Intelligence Sources Worth Following

Staying current on the cloud threat landscape is an operational requirement, not a periodic review. The primary sources used by production security teams are:

AWS GuardDuty threat intelligence — managed threat detection that integrates with CloudTrail, VPC Flow Logs, and DNS logs. Study the GuardDuty finding types catalog to understand which behavioral patterns AWS considers indicative of compromise.
MITRE ATT&CK for Cloud — the cloud matrix at attack.mitre.org maps every technique (T1530: Data from Cloud Storage, T1078.004: Cloud Accounts) to real-world adversary behavior. Use it to measure coverage gaps in your detections.
Sysdig Threat Research and Lacework Labs — publish container and Kubernetes-specific threat reports with real attack tooling (TeamTNT, Rocke, Kinsing) and their indicators of compromise.
AWS Security Bulletins and GCP Security Advisories — official channels for platform-level vulnerabilities. Subscribe to both via RSS and treat them as operational alerts, not optional reading.

Pro practice: Before your team writes a single Terraform resource or deploys a single pod, threat-model the architecture. Identify the trust boundaries, the data flows, and the entry points. Ask: what is the worst thing an attacker could do with access to this component? Then design controls that make that worst case either impossible or detectable within minutes. Threat modeling is not a once-per-project activity — it is a continuous practice that evolves alongside your architecture and your understanding of the current threat landscape.

The lessons that follow in this module convert threat landscape knowledge into concrete controls: CSPM scanning to catch misconfigurations before deployment, IAM hardening to close the credential attack surface, network segmentation to limit lateral movement, and runtime security to detect attacker behavior in real time. Each control targets a specific link in the attack chain you have now mapped in detail.