Admission Control & Webhooks
Admission Control & Webhooks
Every time you run kubectl apply, your request travels through several layers before a single Pod is created. The final gatekeeping layer — the one that lets Google, Stripe, and other large-scale operators enforce security policies, inject sidecars, and validate resource quotas in real time — is the admission control subsystem. Understanding it deeply means you can both rely on it for cluster safety and debug it when it silently rejects your workloads.
The API Request Lifecycle
Before diving into webhooks, trace the full path a request takes through the API server. This sequence is deterministic and non-negotiable — there are no shortcuts:
Key insight: mutating webhooks run before validating webhooks. This ordering is intentional — mutators modify the object first (injecting sidecars, adding labels, defaulting fields), and then validators inspect the final shape. If a validating webhook fires before mutation, policies would reject objects that would have been fixed by the mutator.
Built-in Admission Controllers vs. Webhooks
Kubernetes ships with compiled-in admission controllers enabled by default (e.g., NamespaceLifecycle, LimitRanger, ResourceQuota, PodSecurity). These run before webhooks and handle the most critical invariants. You extend the system with dynamic admission — the two webhook types:
- MutatingAdmissionWebhook — can modify the object via a JSON Patch response. Used for sidecar injection (Istio, Linkerd), secret encryption defaulting, label stamping.
- ValidatingAdmissionWebhook — can only allow or deny. Used for policy enforcement (OPA/Gatekeeper, Kyverno), image registry allowlisting, required annotation checks.
failurePolicy: Fail is set, every matching API request to your cluster will be rejected until the webhook recovers. This has caused major incidents at large companies. Always deploy webhook servers with at least 2 replicas and a PodDisruptionBudget.
Registering a Validating Webhook
Webhooks are registered with ValidatingWebhookConfiguration or MutatingWebhookConfiguration objects. The API server uses these to know which HTTPS endpoint to call, and which resource/operation combinations trigger it.
The caBundle field is critical — the API server uses it to verify the webhook's TLS certificate. In production, use cert-manager with a Certificate resource and the cainjector to automatically populate caBundle. Rotating this certificate manually is an operational trap.
What the Webhook Server Returns
Your webhook is a plain HTTPS server. The API server sends an AdmissionReview JSON body and expects one back. For a validating webhook, the response is simple:
A mutating webhook returns the same structure but also includes a patch field (base64-encoded JSON Patch) and "patchType": "JSONPatch". The patch can add, remove, or replace fields on the object.
Kyverno and OPA Gatekeeper — Production Policy Engines
Writing raw webhook servers is error-prone at scale. Production clusters at big-tech companies use policy engines that implement webhooks internally and let you write declarative policies instead of Go/Python servers.
- Kyverno — Kubernetes-native. Policies are CRDs (
ClusterPolicy). Simpler YAML syntax, built-in auto-mutation support, audit mode. - OPA Gatekeeper — Uses
ConstraintTemplate(Rego language). More expressive, better for complex cross-field validation. Standard at Google and large enterprises.
validationFailureAction: Audit first. This logs violations without blocking workloads, letting you discover how many existing resources already violate the policy before enforcing it. Enforce blindly in a large cluster and you will block your own CD pipelines.
Failure Modes and Debugging
When a webhook rejects your request, kubectl prints the status.message directly. But webhooks can also fail in subtler ways:
- Timeout — If the webhook server takes longer than
timeoutSeconds, the API server treats it as a failure. WithfailurePolicy: Failthis blocks the request; withIgnoreit silently skips. Either is dangerous. - TLS mismatch — An expired or rotated cert that was not propagated to
caBundlecauses all requests to fail with a TLS verification error, not a helpful policy message. - Webhook loop — A mutating webhook that watches its own objects and mutates them again. Always add a
namespaceSelectororobjectSelectorto exclude the webhook server's own namespace.
failurePolicy: Fail on a webhook that calls an external service (e.g., a remote OPA server, an external secret vault). External dependencies introduce latency and availability risk. If that external service has a 5-minute outage, your cluster cannot create any Pods. Use Ignore for externally-dependent webhooks, and compensate with post-admission audit tooling.
Webhook Best Practices at Scale
- Scope narrowly — Use
namespaceSelector,objectSelector, and specificrulesto match only what you need. A webhook that fires on every object in the cluster multiplies API server latency. - Use cert-manager — Never manage webhook TLS certificates by hand. The
cainjectorkeepscaBundlein sync automatically. - Set
sideEffects: None— Required for dry-run support (kubectl apply --dry-run=server). If your webhook has side effects (writes to a DB, calls an API), you must declareNoneOnDryRunand implement dry-run detection. - Exclude system namespaces — Always exclude
kube-systemand your webhook's own namespace from all webhook rules to avoid control-plane disruption. - Monitor webhook latency — Expose a
/metricsendpoint on your webhook server and alert on p99 latency > 2 seconds. The API server's own metrics exposeapiserver_admission_webhook_admission_duration_seconds.