Configuration Management with Ansible

Variables, Facts & Templates

18 min Lesson 5 of 30

Variables, Facts & Templates

Ad-hoc commands and static playbooks will carry you only so far. The moment you need the same playbook to configure a development box differently from a production box — different memory limits, different upstream endpoints, different TLS certificates — you need variables. The moment you need to write a config file whose content depends on the actual RAM or IP address of the target host, you need Jinja2 templates. And the moment Ansible automatically discovers what a host looks like without you having to hard-code any of that — that is facts.

These three mechanisms are the engine of production Ansible at any serious engineering organization. Mastering their interaction, and especially their precedence order, prevents entire classes of bugs.

Variable Precedence — The Full Stack

Ansible merges variables from many sources. When the same variable name appears in more than one place, Ansible applies a strict precedence chain: the source higher in the list wins. From lowest to highest priority:

Role defaults (roles/myrole/defaults/main.yml)
Inventory file or script group variables
Inventory group_vars/all files
Inventory group_vars/<groupname> files
Inventory host_vars/<hostname> files
Host facts (gathered automatically)
Play variables (vars: key in a playbook)
Play vars_files: and vars_prompt:
Role variables (roles/myrole/vars/main.yml)
Block variables and task variables (vars: on a task)
Include variables (include_vars module)
set_fact / register results
Extra variables (-e / --extra-vars on the CLI) — highest priority, always wins

Production pitfall — role vars vs. role defaults: roles/myrole/vars/main.yml (priority 9) silently overrides everything below it, including your group_vars. If a role sets nginx_worker_processes in its vars/main.yml, your inventory-level override is quietly ignored. Always put operator-tunable values in defaults/main.yml; use vars/main.yml only for role-internal constants that operators should never need to touch.

The practical mental model at big-tech scale: defaults are the "safe baseline", group_vars and host_vars are the "environment overlay", and -e on the CLI is the "emergency override". CI pipelines should never use -e for normal promotion; that is a human escape hatch.

Defining Variables — Inventory, group_vars, host_vars

The preferred layout for any production inventory is a directory, not a flat file. This scales cleanly and allows per-group and per-host overrides to be separate, reviewable files.

# inventory/
#   hosts.ini         — static host list
#   group_vars/
#     all.yml         — applies to every host
#     web.yml         — applies to hosts in [web] group
#     db.yml          — applies to hosts in [db] group
#   host_vars/
#     prod-web-01.yml — applies only to this host

# inventory/group_vars/all.yml
---
app_name: myapp
app_port: 8080
log_level: info
deploy_user: deploy

# inventory/group_vars/web.yml
---
nginx_worker_processes: auto
nginx_keepalive_timeout: 75

# inventory/host_vars/prod-web-01.yml
---
nginx_worker_processes: 16    # This host has 16 cores — override the group default

Inside a playbook you reference any variable with double curly braces: {{ app_port }}. Ansible resolves the value at runtime after applying the full precedence chain.

Keep secrets out of group_vars: Never put passwords, API keys, or private keys in plain-text variable files. Use Ansible Vault (covered in Lesson 8) to encrypt those values — but the variable structure and key names live in group_vars exactly as shown, so the pattern stays consistent.

Gathered Facts — Free Host Intelligence

When Ansible connects to a host and gather_facts: true (the default), it runs the setup module, which collects hundreds of facts about the target: OS family, kernel version, total RAM, CPU count, all IP addresses, mounted filesystems, virtualization type, and more. All of these are available as variables prefixed with ansible_.

# Run ad-hoc to inspect facts on one host
ansible web -i inventory/ -m setup | grep -E 'ansible_(os_family|memtotal_mb|processor_vcpus|default_ipv4)'

# Typical output:
#   "ansible_os_family": "RedHat",
#   "ansible_memtotal_mb": 15954,
#   "ansible_processor_vcpus": 8,
#   "ansible_default_ipv4": {
#       "address": "10.0.1.42",
#       ...
#   }

Key facts you will use constantly in production playbooks:

ansible_os_family — "Debian", "RedHat", "Archlinux" — use this for conditional package installation
ansible_memtotal_mb — total RAM; drive JVM heap or Nginx worker limits from this
ansible_processor_vcpus — CPU count; set nginx_worker_processes automatically
ansible_default_ipv4.address — the primary IP; use in config files and firewall rules
ansible_hostname / ansible_fqdn — host identity
ansible_distribution / ansible_distribution_version — precise OS and version

Custom facts: You can push your own facts onto hosts by dropping JSON or INI files into /etc/ansible/facts.d/ on the managed node (file name ends in .fact). They surface under ansible_local. Teams use this to record deployment timestamps, application versions, and environment identifiers — all queryable from subsequent playbook runs.

Fact gathering adds roughly 0.5–2 seconds per host. At 1,000-host scale this is measurable. You can disable gathering entirely with gather_facts: false in the play header for ultra-fast playbooks that do not need system info, or cache facts in Redis or a JSON file with fact_caching in ansible.cfg.

Jinja2 Templates — Config Files That Know Their Host

The template module copies a file from the controller to a managed node, but before copying it runs the content through Jinja2. Every {{ variable }} expression is expanded, every {% if %} or {% for %} block is evaluated. The result is a fully rendered text file — an Nginx config, a JVM options file, a systemd unit, a Prometheus scrape config — that is exactly right for that specific host.

Template source files live in a role's templates/ directory and conventionally carry a .j2 extension. The playbook task looks like this:

# roles/nginx/tasks/main.yml
---
- name: Deploy nginx configuration
  ansible.builtin.template:
    src: nginx.conf.j2
    dest: /etc/nginx/nginx.conf
    owner: root
    group: root
    mode: '0644'
  notify: Reload nginx

# roles/nginx/templates/nginx.conf.j2
worker_processes {{ ansible_processor_vcpus }};
worker_rlimit_nofile 65535;

events {
    worker_connections 4096;
}

http {
    keepalive_timeout {{ nginx_keepalive_timeout | default(75) }};
    server_tokens off;

    log_format main '$remote_addr - $remote_user [$time_local] "$request" '
                    '$status $body_bytes_sent "$http_referer" '
                    '"$http_user_agent"';

    access_log /var/log/nginx/access.log main;

    upstream {{ app_name }}_backend {
{% for host in groups['app'] %}
        server {{ hostvars[host]['ansible_default_ipv4']['address'] }}:{{ app_port }};
{% endfor %}
    }

    server {
        listen 80;
        server_name {{ ansible_fqdn }};

        location / {
            proxy_pass http://{{ app_name }}_backend;
        }
    }
}

This single template produces a perfectly tuned Nginx config on every web host: worker count matches the actual CPU, the upstream block is built from the live list of app-group hosts, and the server name is the real FQDN — all without a human ever looking up those values.

Variable precedence stack feeds the Jinja2 engine, which renders a unique config file for every target host.

Jinja2 Filters — Transforming Values

Jinja2 ships with a rich filter library that lets you transform variable values inline. Ansible adds dozens of extra filters on top. The most useful in production:

{{ nginx_worker_processes | default(4) }} — safe default when variable might be undefined
{{ app_name | upper }} — string manipulation
{{ ansible_memtotal_mb * 0.75 | int }} — arithmetic; drive JVM -Xmx from real RAM
{{ groups['app'] | length }} — count hosts in a group
{{ my_list | join(',') }} — join a list for a comma-separated config value
{{ secret_value | b64encode }} — base64-encode for a Kubernetes secret manifest

set_fact — Dynamic Variables at Runtime

Sometimes you need to compute a variable from a combination of facts and then reuse it across multiple tasks. ansible.builtin.set_fact assigns a variable at runtime with the same high priority as a register result:

- name: Calculate JVM heap size (75 percent of RAM)
  ansible.builtin.set_fact:
    jvm_heap_mb: "{{ (ansible_memtotal_mb * 0.75) | int }}"

- name: Deploy JVM options
  ansible.builtin.template:
    src: jvm.options.j2
    dest: /etc/elasticsearch/jvm.options

# jvm.options.j2
-Xms{{ jvm_heap_mb }}m
-Xmx{{ jvm_heap_mb }}m
-XX:+UseG1GC

Template the entire config, not just the moving parts. A common anti-pattern is to use lineinfile to patch one value in a config file that a package installed. This is fragile — a package upgrade can restore the original file, silently reverting your change. The production pattern is: deploy the entire config file from a Jinja2 template so every value under your control is version-tracked and idempotently applied on every run.

Production Summary

At any serious engineering organization running Ansible at scale, these disciplines are non-negotiable: put tunable values in role defaults/ so operators can override them; put environment overlays in group_vars/ and host_vars/; drive config values from gathered facts so your playbooks self-adapt to the actual hardware; write Jinja2 templates that own the entire config file rather than patching individual lines; and always validate the rendered template against a known-good reference before deploying to production.