DaemonSets & Node-Level Workloads
DaemonSets & Node-Level Workloads
Most Kubernetes workloads are fungible: the scheduler picks whichever nodes have spare capacity, and you do not care which node a given replica lands on. But some workloads must run on every node, exactly once — a log forwarder that ships container logs to a central aggregator, a monitoring agent that exposes per-node CPU/memory/disk metrics, a network plugin (CNI) that wires up Pod networking, or a security scanner that watches every container on the host. These are node-level infrastructure agents, and Kubernetes provides the DaemonSet controller to manage them.
What Is a DaemonSet?
A DaemonSet ensures that one Pod runs on every (or a selected subset of) nodes in the cluster. When a node joins the cluster, the DaemonSet controller automatically schedules a Pod onto it. When a node is drained or deleted, the Pod is garbage-collected. You never set a replicas field on a DaemonSet — the replica count is determined entirely by the number of nodes that match the selector.
Canonical Use Cases at Big-Tech Scale
- Log shipping: Fluentd, Fluent Bit, or Vector reading
/var/log/containers/*.logfrom the host filesystem and forwarding to Elasticsearch, Loki, or Splunk. - Node monitoring: Prometheus
node_exporterexposing CPU, memory, disk, and network metrics for each node; Datadog Agent, New Relic Infrastructure. - Network plugins (CNI): Calico, Cilium, Weave — these are DaemonSets that configure iptables or eBPF rules on every node so Pods can communicate across the cluster.
- Storage plugins (CSI node drivers): Agents that attach and mount volumes on the local node.
- Security agents: Falco, Sysdig, or Aqua runtime security scanning every syscall on every node.
Writing a Real DaemonSet Manifest
Below is a production-grade Fluent Bit DaemonSet that ships container logs to an Elasticsearch cluster. Key details: it mounts /var/log and /var/lib/docker/containers from the host (read-only), runs as a privileged container so it can read kernel-level log metadata, and sets conservative resource limits so it cannot starve application Pods.
Tolerations: Scheduling Onto Tainted Nodes
Kubernetes uses taints on nodes to repel Pods. A taint says "do not schedule here unless you explicitly tolerate this." Control-plane nodes carry node-role.kubernetes.io/control-plane:NoSchedule by default; GPU nodes often carry nvidia.com/gpu=present:NoSchedule; nodes being drained carry node.kubernetes.io/unschedulable:NoSchedule.
Infrastructure DaemonSets almost always need to run on every node — including tainted ones — so they must declare tolerations that match those taints. A toleration has three fields: key, operator (Equal or Exists), and effect (NoSchedule, PreferNoSchedule, or NoExecute). Using operator: Exists without a value matches any taint with that key regardless of value — useful for blanket toleration of all infrastructure taints.
Targeting a Node Subset with nodeSelector and nodeAffinity
Sometimes you want a DaemonSet to run only on nodes with specific hardware or roles — GPU nodes for a CUDA metrics exporter, or SSD-backed nodes for a high-throughput log forwarder. Use nodeSelector (simple label match) or nodeAffinity (richer expressions) in the Pod template:
Operational Commands
Production Failure Modes
The most common DaemonSet incident at scale: a new node joins the cluster but the DaemonSet Pod stays in Pending. Root cause is almost always a missing toleration. The new node has a custom taint (e.g. a cloud provider spot-instance taint like kubernetes.azure.com/scalesetpriority=spot:NoSchedule) that the DaemonSet manifest does not tolerate. Always audit the taints on every node class in your cluster and ensure your infrastructure DaemonSets tolerate all of them.
A second common failure: a DaemonSet log agent consumes unbounded memory during a log burst, triggers OOMKill, restarts, and enters a CrashLoopBackOff on every node simultaneously — breaking observability right when you need it most. Always set memory limits and configure the agent\'s internal buffer and backpressure settings so it degrades gracefully under load instead of crashing.
Update Strategy Considerations
DaemonSets support two update strategies. RollingUpdate (default since Kubernetes 1.6) replaces Pods one node at a time, respecting maxUnavailable — set this to 1 in production so you never lose log coverage on more than one node simultaneously. OnDelete only replaces a Pod when you manually delete it — useful for critical CNI plugins where an in-place restart would break Pod networking on that node and you prefer to drain the node first.
Always test DaemonSet updates on a staging cluster with an identical node configuration. A bad Fluent Bit config that crashes the agent will propagate to every node in the cluster within minutes of a rolling update — there is no concept of a "canary DaemonSet Pod" out of the box.