Kubernetes Hardening: Cluster
Kubernetes Hardening: Cluster
Pod-level controls stop a compromised workload from escaping its sandbox. Cluster-level hardening is a different discipline: it protects the control plane itself — the API server, etcd, the scheduler, and the admission pipeline. An attacker who reaches the control plane does not need to escape any container; they can create new Pods, read every Secret, and backdoor cluster objects at will. The blast radius is the entire cluster, across every tenant namespace.
This lesson covers the four control-plane pillars that production security teams audit first: API server exposure, RBAC least-privilege review, audit logging, and etcd encryption at rest. Each one has well-known default insecurities that teams commonly inherit from convenience configurations or cloud-provider defaults that prioritized ease of setup over security posture.
API Server Exposure: Shrink the Attack Surface
The Kubernetes API server is the single entry point to the cluster. Every kubectl command, every controller reconciliation loop, and every webhook call goes through it. Leaving it internet-reachable is equivalent to exposing your database's management port to the public internet — scanners find it within minutes, and once found, it becomes a persistent brute-force and exploit target.
In managed offerings (EKS, GKE, AKS) the API server runs in the provider's VPC and is fronted by a load balancer with a public IP by default. The first hardening step is to disable public access and restrict the endpoint to the cluster's own VPC plus specific corporate CIDR ranges used by CI runners and engineers.
Beyond network exposure, the API server itself should be hardened at the flag level. Self-managed clusters (kubeadm, k3s, Rancher) inherit whatever flags the installer sets. The most important flags to audit in /etc/kubernetes/manifests/kube-apiserver.yaml are:
--anonymous-auth=false— disables thesystem:anonymoususer, preventing unauthenticated requests from reaching the authz layer.--insecure-port=0— disables the legacy HTTP port (default 8080). Kubernetes 1.20+ deprecated this; ensure it is explicitly set to 0.--enable-admission-plugins— must includeNodeRestriction(prevents kubelets from modifying other nodes' objects) andAlwaysPullImages(forces fresh credential checks on every Pod start, preventing stale credential reuse).--authorization-mode=Node,RBAC— neverAlwaysAllow, which grants every authenticated request full access.
RBAC Least-Privilege Review
Role-Based Access Control (RBAC) in Kubernetes is expressive — and easy to misuse. The three most dangerous patterns seen repeatedly in production security reviews are:
- Wildcard verbs or resources:
verbs: ["*"]orresources: ["*"]in a ClusterRole gives the subject permission to do anything to everything. This appears frequently in CI service accounts because it was "the simplest way to make the pipeline work." - Cluster-scoped bindings for namespace-scoped work: A ClusterRoleBinding attaches a role to a subject across every namespace — present and future. Most application service accounts should use RoleBindings scoped to a single namespace.
- Default service account misuse: Every Pod gets the
defaultservice account in its namespace if noserviceAccountNameis specified. If anything has been bound to that default service account, every Pod in the namespace inherits those permissions.
The remediation pattern is to build roles from first principles: enumerate the exact API groups, resources, and verbs the workload actually needs. A typical web application deployment controller needs get, list, watch, and patch on Deployments only in its own namespace — nothing more. Use the kubectl auth can-i --list output as your baseline, then prune everything not exercised in production.
kubectl rbac-tool lookup <subject>) and rakkess produce access matrices that show exactly what each subject can do to each resource. Run these in your audit pipeline and diff the output across releases to catch privilege creep before it reaches production.
Kubernetes Audit Logging
Audit logs are the answer to: "Who changed that ClusterRoleBinding at 2 AM, and what else did they touch?" Without audit logs you are flying blind during an incident — you cannot determine what credentials were used, which resources were accessed, or whether persistence was established. This is the number-one forensic gap in clusters that security teams encounter after a breach.
The API server supports four audit levels — None, Metadata, Request, and RequestResponse — applied via an audit policy file. The production pattern is to log Metadata for high-volume read paths (reduces storage cost) and RequestResponse for all mutation operations (writes, exec, port-forward).
Apply the policy by adding flags to the API server manifest and shipping logs to a centralized SIEM (Splunk, Elastic, or a cloud-native service like AWS CloudWatch or GCP Cloud Audit Logs). The two flags are --audit-log-path=/var/log/kubernetes/audit.log and --audit-policy-file=/etc/kubernetes/audit-policy.yaml. For managed clusters, audit logging is enabled per-provider: on EKS, enable the audit log type in the cluster logging configuration; on GKE, Admin Activity and Data Access audit logs are configurable under Cloud Audit Logs.
etcd Encryption at Rest
etcd is the Kubernetes data store — it contains every Secret, every ConfigMap, every object in the cluster. By default, Kubernetes Secrets are stored in etcd as base64-encoded plaintext. Anyone with read access to the etcd data volume (a compromised etcd node, a snapshot restored to a dev machine, or a misconfigured backup bucket) can decode every Secret in the cluster with a single base64 -d call.
etcd encryption at rest is configured via an EncryptionConfiguration file, referenced in the API server with --encryption-provider-config. The recommended provider is aescbc (AES-256 in CBC mode with HMAC-SHA1 for integrity) or secretbox (XSalsa20-Poly1305, faster). For production at scale, use the KMS provider (AWS KMS, GCP CKMS, Azure Key Vault) so the data-encryption key is itself envelope-encrypted by a hardware-managed key — the cluster never holds the master key in memory.
Putting It Together: The Cluster Hardening Checklist
Each of these controls addresses a distinct failure mode. They compound: RBAC without audit logs means you cannot detect when a misconfigured binding is exploited. Audit logs without etcd encryption mean log entries reference secrets that are stored in plaintext on disk. Run all four together as a cohesive posture, and validate them continuously with tools like kube-bench (CIS benchmark scanner), Trivy (misconfiguration scanner with Kubernetes support), and Falco (runtime rule engine that can alert on suspicious API server activity in real time).
kube-bench into your CI/CD pipeline as a post-deploy step. Set it to fail the pipeline if any FAIL-level CIS checks appear for your cluster profile (EKS, GKE, or generic). This prevents hardening regressions from shipping silently — a misconfigured admission plugin or a new ClusterRoleBinding with wildcard verbs gets caught before it reaches production.