Compliance & Policy as Code

Open Policy Agent & Rego

18 min Lesson 4 of 27

Open Policy Agent & Rego

Open Policy Agent (OPA) is the de-facto standard for policy enforcement across the cloud-native stack. Originally built at Styra and donated to the CNCF, it graduated to production-ready status in 2021 and is now embedded inside Kubernetes admission controllers (Gatekeeper), service meshes (Istio AuthorizationPolicy integration), API gateways (Kong, Envoy ext_authz), Terraform plan validation (Conftest), and dozens of commercial products. The central idea is decoupled policy: instead of baking access logic into every service, you push all decisions to a single, independently deployable engine that speaks one language — Rego.

OPA is a general-purpose policy engine, not a Kubernetes tool. It evaluates any JSON input against any Rego policy and returns any JSON output. Kubernetes admission control is the most visible use case, but OPA also governs API authorization, data masking, CI pipeline gating, SSH access, and cloud account provisioning — anywhere you need programmable, auditable rules that live outside your application code.

OPA Architecture

OPA runs as a sidecar process or standalone service. Consumers send a JSON query (what do you want to know?) together with a JSON input (the thing being evaluated). OPA evaluates the query against its loaded policy bundle and the optional data document (context such as user roles or allowlists), then returns a JSON decision. Nothing about that exchange is Kubernetes-specific.

OPA Architecture: Input → Policy Engine → Decision OPA Architecture: Decoupled Policy Evaluation Caller Service / Webhook OPA Engine Policy Bundle Rego rules (git-versioned) Data Document Roles, allowlists, context Rego Evaluator Decision allow / deny + reason Bundle Store OCI / HTTPS — hot-reload (no restart) input JSON decision JSON bundle pull
OPA decouples policy decisions from application code. The caller sends an input document; OPA evaluates it against Rego rules and external data, then returns a structured decision. Bundles are hot-reloaded — policy updates require no OPA restart.

Three deployment patterns cover essentially all production use cases:

  • Sidecar: OPA runs as a container alongside every application pod. Low latency (loopback), isolated blast radius, slightly higher resource overhead. Used by Envoy ext_authz, Linkerd, and Consul.
  • Admission webhook (Gatekeeper): OPA runs inside the Kubernetes control plane as a ValidatingAdmissionWebhook. Every write to the API server is synchronously evaluated before the resource is persisted. This is the primary Kubernetes enforcement point.
  • Daemon / standalone: A central OPA service evaluates queries from many callers. Appropriate when you want a single auditable decision log. Must be treated as a critical-path dependency: if it goes down, every caller either fails open or closed depending on configuration.

Rego: The Policy Language

Rego is a declarative, logic-based language purpose-built for policy. It is not imperative — you do not write if/else trees and mutate state. Instead, you write logical relationships that the Rego evaluator resolves to a value. The model is closest to Datalog (a subset of Prolog), which makes Rego feel unfamiliar to engineers who have only written procedural code. The payoff is that Rego policies are provably consistent: the same input always produces the same output, and the evaluator can explain exactly why a decision was reached.

Rego Fundamentals: Rules, References, Comprehensions

A Rego file is a module with a package declaration. Rules are equations. The body of a rule is a set of expressions that must all be true (conjunctive logic — AND). Multiple rules with the same head are alternatives (disjunctive logic — OR). You query a value via dot-notation on the document tree (input.spec.containers[_]).

# ── Rego fundamentals: package, rules, built-ins ── # File: policy/require-labels.rego package kubernetes.admission import future.keywords.if import future.keywords.in # Default decision: no violation (empty set) default deny = false # Rule: every Deployment must have the "team" label deny if { input.request.kind.kind == "Deployment" not input.request.object.metadata.labels.team } # Rule: every container image must come from an approved registry deny if { some container in input.request.object.spec.containers not startswith(container.image, "registry.corp.internal/") } # Composite rule: produce a set of violation messages (Gatekeeper pattern) violation[{"msg": msg}] if { some container in input.request.object.spec.containers not startswith(container.image, "registry.corp.internal/") msg := sprintf("Container %v uses unapproved registry: %v", [container.name, container.image]) }

Key Rego idioms that every engineer working with OPA must know:

  • The wildcard iterator [_]: iterates over all elements of an array. input.spec.containers[_].image is true for any container whose image satisfies the rest of the rule.
  • some x in collection: the modern idiomatic alternative (requires import future.keywords.in). Prefer this over [_] in new policies for readability.
  • Negation not: true when the expression in the body is undefined or false. not input.metadata.labels.team is true when the team label is absent — but also when it is explicitly set to null. Understand this before writing security rules.
  • Default rules: default allow = false provides a safe fallback for partial rules. Always declare defaults on security-relevant rules.
  • Partial rules and comprehensions: violation[msg] { ... } builds a set of all matching messages. The Gatekeeper pattern relies on this — it collects all violations from a resource in one query rather than stopping at the first.

Writing a Production-Quality Policy

The following policy enforces three common Kubernetes guardrails in a single module: required labels, prohibited image registries, and the absence of root user containers. This is representative of what a real platform team ships.

# File: policy/platform-baseline.rego # Enforces: (1) required labels, (2) approved registries, (3) no root UID package platform.admission.baseline import future.keywords.if import future.keywords.in required_labels := {"team", "env", "version"} approved_registry := "registry.corp.internal/" # ── 1. Required labels ────────────────────────────────────────────────────── violation[{"msg": msg, "field": "metadata.labels"}] if { some lbl in required_labels not input.review.object.metadata.labels[lbl] msg := sprintf("Missing required label: %v", [lbl]) } # ── 2. Approved image registry ─────────────────────────────────────────────── violation[{"msg": msg, "field": "spec.containers"}] if { some c in input.review.object.spec.containers not startswith(c.image, approved_registry) msg := sprintf("Image %v not from approved registry %v", [c.image, approved_registry]) } # ── 3. No container may run as UID 0 ───────────────────────────────────────── violation[{"msg": msg, "field": "spec.securityContext"}] if { some c in input.review.object.spec.containers c.securityContext.runAsUser == 0 msg := sprintf("Container %v must not run as UID 0 (root)", [c.name]) } # Also check pod-level security context violation[{"msg": msg, "field": "spec.securityContext"}] if { input.review.object.spec.securityContext.runAsUser == 0 msg := "Pod-level securityContext.runAsUser must not be 0" }

Testing Rego Policies with opa test

A Rego policy without tests is a liability. OPA ships a built-in test runner. Test files live alongside policy files and follow the naming convention *_test.rego. Tests are regular Rego rules whose names start with test_. Any test that evaluates to false or is undefined is a failure.

# File: policy/platform-baseline_test.rego package platform.admission.baseline # Helper: build a minimal Pod review document make_pod(containers) := { "review": { "object": { "metadata": { "labels": {"team": "platform", "env": "prod", "version": "v1.2"} }, "spec": { "containers": containers, "securityContext": {} } } } } # PASS: compliant pod — zero violations expected test_compliant_pod_passes if { containers := [{"name": "app", "image": "registry.corp.internal/myapp:1.0", "securityContext": {"runAsUser": 1000}}] count(violation) == 0 with input as make_pod(containers) } # FAIL: unapproved registry — one violation expected test_unapproved_registry_fails if { containers := [{"name": "app", "image": "docker.io/nginx:latest", "securityContext": {"runAsUser": 1000}}] count(violation) == 1 with input as make_pod(containers) } # FAIL: root UID — one violation expected test_root_uid_fails if { containers := [{"name": "app", "image": "registry.corp.internal/myapp:1.0", "securityContext": {"runAsUser": 0}}] count(violation) == 1 with input as make_pod(containers) } # Run: opa test ./policy/ -v # Expected output: # PASS: test_compliant_pod_passes (1.2ms) # PASS: test_unapproved_registry_fails (0.8ms) # PASS: test_root_uid_fails (0.9ms)
Run opa eval interactively during policy development. Use opa eval -d policy/ -i input.json 'data.platform.admission.baseline.violation' to see exactly what your policy returns for a given input before deploying it. The --explain full flag prints the full evaluation trace — invaluable when debugging a rule that fires unexpectedly or not at all. At scale, Styra DAS and the OPA VS Code extension add IDE-level coverage analysis.

OPA Bundle Distribution

In production, policies are not baked into the OPA binary. They are distributed as bundles — tar.gz archives of .rego files and data.json documents — pushed to an OCI registry or HTTPS endpoint. OPA polls the bundle endpoint at a configured interval and hot-reloads policies without a restart. This is what makes policy updates a CI operation: merge to main, CI builds and pushes the new bundle, OPA clusters pick it up within seconds.

# opa.yaml — OPA configuration file (sidecar or standalone) services: bundle-server: url: https://bundles.corp.internal credentials: bearer: token_path: /var/run/secrets/bundle-token bundles: platform-baseline: service: bundle-server resource: /bundles/platform-baseline.tar.gz polling: min_delay_seconds: 30 max_delay_seconds: 120 decision_logs: console: true # emit structured decision logs to stdout for Fluentd reporting: min_delay_seconds: 5 # batch and send decisions to a central log sink max_delay_seconds: 15
The "undefined = allow" footgun. In Rego, if a rule is never matched, the result is undefined, which is different from false. OPA callers that treat undefined as allow (common in hand-rolled integrations) create a bypass: a malformed input that fails to match any rule will be silently permitted. Always use default allow = false or configure your integration to treat undefined decisions as deny. Gatekeeper handles this correctly; double-check any custom OPA REST API caller you write.

OPA in the Kubernetes Admission Path

When you deploy Gatekeeper (covered in the next lesson), OPA is the evaluation engine behind the scenes. Understanding the raw OPA query/response cycle is essential for debugging Gatekeeper failures, writing custom constraint templates, and integrating OPA into non-Kubernetes systems. The Kubernetes API server sends an AdmissionReview JSON object as the input; the policy must produce a violation set; Gatekeeper maps that set to an admission response. Every opa eval call you run locally with a real AdmissionReview fixture reproduces exactly what Gatekeeper will do in production.