Agents & Distributed Builds
Agents & Distributed Builds
The Jenkins controller (formerly "master") is the brain of your CI platform — it schedules jobs, manages state, and serves the UI. It should never execute build workloads. Every real build runs on an agent (formerly "slave"), a process that the controller connects to via the Remoting protocol over TCP or WebSocket, giving you safe isolation, horizontal scalability, and the ability to match build environments to job requirements.
Static Agents
A static agent is a persistent machine (VM, bare-metal server, or long-running container) registered in Jenkins under Manage Jenkins → Nodes. The controller SSHes into it (or the agent dials back with agent.jar) and keeps a permanent JNLP/TCP connection open.
Each static node is configured with:
- Labels — space-separated tags (
linux,docker,gpu-builder,windows). A pipeline'sagent { label 'linux && docker' }expression selects matching nodes. - Executors — how many concurrent builds the node accepts (usually 1–2× CPU count).
- Root directory — workspace root; use a fast local SSD, not NFS.
- Availability — keep-online vs. on-demand (wake on job, disconnect after idle).
Launch via SSH (the recommended method): Jenkins opens an SSH connection to the agent host and runs java -jar agent.jar. Ensure the controller's SSH private key is stored in Jenkins Credentials and the agent host is reachable on port 22.
Dynamic Agents
Dynamic agents are provisioned on-demand and destroyed after the build completes. The two dominant back-ends are:
- Kubernetes Plugin — spins up a Pod per build, runs the build inside a container, tears it down. This is the standard model for cloud-native Jenkins.
- EC2 Plugin / Azure VM Agents / Google Compute Plugin — provisions a cloud VM, runs the build, terminates the instance. Useful when you need full OS-level isolation or heavyweight tools.
- Docker Plugin — starts a container on a Docker daemon host per build. Simpler than Kubernetes but ties you to a single Docker host.
Container Agents: The Kubernetes Plugin
At scale, nearly every large engineering org runs Jenkins agents as Kubernetes Pods. The Kubernetes plugin creates a PodTemplate — a Pod spec fragment — for each type of build environment. When a pipeline requests a matching label, the plugin calls the Kubernetes API, the Pod starts, the jnlp container dials back to the controller, and build steps execute inside the specified containers.
A minimal Kubernetes PodTemplate in a declarative pipeline looks like this:
Labeling Strategy at Scale
Labels are how the controller matches jobs to capacity. A coherent labeling taxonomy prevents the "works on my build node" class of failures:
- OS / platform —
linux,windows,macos-arm64 - Runtime —
java17,node20,python311 - Capability —
docker,gpu,large-mem(for linking or ML workloads) - Environment —
prod-deploy(restricted nodes with cloud credentials)
In the pipeline you combine labels with boolean operators:
PodTemplate Reuse via the Kubernetes Plugin UI or Shared Libraries
Defining yaml: """...""" inline in every Jenkinsfile leads to configuration drift. The production pattern is to define canonical PodTemplate objects either in the Kubernetes plugin's global configuration (Manage Jenkins → Clouds → Kubernetes → Pod Templates) or — better — in a Shared Library as a helper function. Individual pipelines then call agent { label 'java-build' } and pick up the centrally-managed spec.
Controller Isolation: Always Use agent none
agent none at the top level and assign specific agents to each stage. If a build step crashes or leaks files onto the controller, it can corrupt build metadata, exhaust disk space, and take down the entire CI platform.
Production Failure Modes
- Agent offline during build — job hangs waiting for a reconnect; set
JNLP_TIMEOUTand configure retry limits in the cloud plugin. - Workspace accumulation on static agents — old workspaces fill disk; use the Workspace Cleanup plugin and a nightly
cleanWs()cron job on each node. - Pod eviction mid-build — Kubernetes may evict a Pod for resource pressure; set Pod
priorityClassName: system-cluster-criticalfor critical pipelines and configure PodDisruptionBudgets on the cluster. - Image pull latency — large agent images (
maven:3.9is 500 MB+) cause cold-start delays; pre-pull images onto nodes with a DaemonSet or use a pull-through registry cache.
Sizing Agents Correctly
Request only what your build actually needs, but set limits to prevent noisy-neighbor issues. Profile your builds with kubectl top pod during a representative run, then set requests at the p50 measured value and limits at the p99. This prevents both under-provisioning (OOMKilled) and over-provisioning (wasted cluster capacity).