We are still cooking the magic in the way!
Gatekeeper & Kyverno
Gatekeeper & Kyverno
Every Kubernetes cluster is a policy enforcement surface. Without guardrails, developers can deploy containers running as root, images pulled from untrusted registries, or workloads with no resource limits that starve other pods. The Kubernetes admission controller mechanism was designed exactly for this: any request to the API server passes through a chain of admission webhooks before it is persisted to etcd. Two projects dominate production policy enforcement at this layer — OPA Gatekeeper (backed by the Open Policy Agent engine) and Kyverno (Kubernetes-native, no separate policy language). Knowing when to reach for each, and the failure modes of both, is essential at big-tech scale.
How Kubernetes Admission Works
When kubectl apply hits the API server, the request flows through authentication, authorization (RBAC), and then two classes of admission webhooks: Mutating (can rewrite the object) and Validating (can only allow or deny). Both Gatekeeper and Kyverno register themselves as these webhooks. If the webhook is unavailable and the failurePolicy is set to Fail, the resource is rejected — that is the safe default for production. If it is set to Ignore, the policy is silently bypassed when the webhook is down. Choose carefully.
OPA Gatekeeper — ConstraintTemplates and Constraints
Gatekeeper extends Kubernetes with two custom resource types. A ConstraintTemplate defines a reusable policy schema backed by a Rego rule. A Constraint is an instance of that template applied to specific resource types and namespaces. The separation means platform teams own templates; product teams can instantiate constraints with different parameters.
Install Gatekeeper with Helm and apply a constraint that blocks containers running as root:
deny blocks the request; warn allows it but returns a warning in the API response (useful during rollout); dryrun records violations without blocking (visible via kubectl get constraint deny-root-containers -o yaml under status.violations). Always start with dryrun in production, validate violation counts drop to zero, then switch to deny.
Kyverno — Kubernetes-Native Policy
Kyverno treats Kubernetes resources as the policy language itself. Policies are YAML documents that pattern-match on incoming resource manifests using match/exclude blocks, then apply validate, mutate, or generate rules. Engineers who already know Kubernetes manifests can read and write Kyverno policies without learning Rego. This dramatically lowers the barrier for platform teams.
The +(field) syntax in Kyverno mutation means "add this field only if it does not already exist" — it will not override explicit values set by the developer. This makes mutations safe to apply fleet-wide without breaking apps that already configure security contexts correctly.
Gatekeeper vs Kyverno — When to Choose Each
Both are CNCF projects, both are production-grade. The right choice depends on your team and use case:
- Choose Gatekeeper if you already use OPA in your stack (Terraform, application authorization), want a single policy language across all enforcement points, or need highly complex logic that benefits from Rego's datalog-style evaluation. The constraint template / constraint split also maps well to a platform-team-owns-policy, product-team-instantiates model.
- Choose Kyverno if your platform team wants policies that any Kubernetes-literate engineer can read and modify, or if you need built-in mutation and generate capabilities without writing a separate mutating webhook. Kyverno also ships a policy library of 200+ ready-to-use policies at
kyverno.io/policies.
failurePolicy: Fail is set, every new pod in the cluster is blocked. Run at least 3 replicas spread across nodes with podAntiAffinity, and use a PodDisruptionBudget with minAvailable: 2 to protect against node drain.
Audit Mode and Policy Exceptions
Both tools support scanning existing resources (not just new ones) against policies. In Gatekeeper, set enforcementAction: dryrun and query violations with kubectl get constraint -o jsonpath='{.items[*].status.violations}'. In Kyverno, set validationFailureAction: Audit and check PolicyReport resources (a CNCF standard): kubectl get polr -A.
When a specific workload legitimately needs an exception — a legacy app that truly must run as root — both tools provide scoped exclusion mechanisms. In Gatekeeper, add to the constraint's spec.match.excludedNamespaces or use spec.match.labelSelector. In Kyverno, use the exclude block with a label selector. Document every exception in a comment in the policy YAML, commit it to git, and review exceptions quarterly — unchecked exceptions accumulate into a compliance liability.
failurePolicy. In a cluster under load during a deploy, this can cascade: new pods cannot start, which increases load on existing pods, which increases webhook latency further. Monitor webhook p99 latency (gatekeeper_webhook_duration_seconds / kyverno_admission_requests_total) and set resource requests high enough that webhook pods are never throttled.