Configuration Management with Ansible

Ansible Architecture & Inventory

18 min Lesson 2 of 30

Ansible Architecture & Inventory

Before you can manage a single server with Ansible, you need to understand how Ansible reaches that server, what it knows about your fleet, and how you organize hundreds or thousands of hosts into logical groups. This lesson covers all three: the agentless SSH architecture, static and dynamic inventories, and the host and group variable system. Get these right and every playbook you ever write will work. Get them wrong and you will spend your career chasing connection errors and variable conflicts.

The Agentless Architecture: Why It Matters

Almost every other configuration management system — Chef, Puppet, SaltStack in agent mode — requires you to install and maintain a daemon on every managed host. That daemon polls a central server, applies configuration, and reports back. The operational burden is real: the agent itself must be patched, must stay running, and can fail in ways that block configuration from being applied at all.

Ansible is fundamentally different. There is no agent. The control node (your laptop, a CI runner, or a bastion host) connects to each managed node over SSH (or WinRM for Windows), pushes a small Python script called a module, executes it, collects the result, and removes the script. The managed node needs only:

An SSH daemon (sshd) listening on a reachable port
Python 3.x in the standard PATH (Python 2.7 still works but is EOL)
A user account the control node can authenticate as

That is it. No firewall exceptions for outbound agent traffic, no certificate rotation, no "my agent is stuck" incidents at 2 AM.

Key idea: Ansible's push model means the control node initiates every connection. The managed node is always passive. This also means Ansible has no persistent knowledge of host state between runs — every play gathers fresh facts. If you need continuous drift detection, pair Ansible with a tool like AWS Config or Open Policy Agent, rather than trying to poll with cron.

Ansible pushes Python modules over SSH, executes them, collects the JSON result, and cleans up — no daemon required on managed nodes.

SSH Connection Tuning for Production Scale

By default, Ansible opens a new SSH connection for every task on every host. At ten hosts that is fine. At five hundred it becomes a bottleneck. Two settings change this completely:

SSH multiplexing (ControlMaster): Ansible re-uses an existing SSH connection for multiple tasks rather than re-authenticating each time. Enabled automatically when ssh_args includes the right flags.
Pipelining: Eliminates the write-module-to-disk, execute, delete cycle. Instead, Ansible pipes the module directly over the existing SSH connection. Requires requiretty to be disabled in /etc/sudoers on the target (the default in modern Linux).

# ansible.cfg — project-level configuration file
# Place this in the same directory as your playbooks

[defaults]
inventory          = ./inventory          # default inventory path
remote_user        = ansible              # OS user on managed nodes
private_key_file   = ~/.ssh/id_ed25519   # SSH key for auth
host_key_checking  = False               # disable for ephemeral cloud hosts
forks              = 20                  # parallel connections (default 5)
gathering          = smart               # cache facts; only gather when needed
fact_caching       = jsonfile
fact_caching_connection = /tmp/ansible_facts
fact_caching_timeout = 3600             # seconds

[ssh_connection]
pipelining = True                        # biggest single performance win
ssh_args   = -o ControlMaster=auto -o ControlPersist=60s -o StrictHostKeyChecking=no

Pro practice: Always commit an ansible.cfg at the root of your Ansible repository. This makes the project self-contained — a new engineer clones the repo and runs ansible-playbook site.yml without needing to know any global configuration. Ansible loads config files in this priority order: ANSIBLE_CONFIG env var → ./ansible.cfg → ~/.ansible.cfg → /etc/ansible/ansible.cfg. The project-local file always wins.

Inventories: Teaching Ansible About Your Fleet

The inventory is Ansible's source of truth for what hosts exist and how to reach them. It is the most important file you will write. Every ad-hoc command and every playbook starts by resolving the inventory.

Static Inventory (INI and YAML)

The simplest inventory is a plain INI file. Hosts are listed by IP or hostname, grouped under [group-name] headers. Groups can contain other groups with the :children suffix.

# inventory/hosts.ini — a production-realistic static inventory

[webservers]
web-01.prod.example.com  ansible_user=ec2-user  ansible_port=22
web-02.prod.example.com  ansible_user=ec2-user  ansible_port=22
web-03.prod.example.com  ansible_user=ec2-user  ansible_port=22

[databases]
db-primary.prod.example.com  ansible_user=ec2-user  ansible_port=2222
db-replica-1.prod.example.com
db-replica-2.prod.example.com

[cache]
redis-01.prod.example.com
redis-02.prod.example.com

# Group of groups — matches all hosts above
[prod:children]
webservers
databases
cache

# Staging environment
[webservers_staging]
web-01.staging.example.com
web-02.staging.example.com

[staging:children]
webservers_staging

# Variables that apply to ALL hosts in a group (prefer group_vars/ files instead)
[webservers:vars]
http_port=80
max_clients=200

The INI format works well for small fleets. For larger, more complex inventories the YAML format is preferred because it supports nested structures without ambiguity:

# inventory/hosts.yml — equivalent YAML inventory
all:
  children:
    prod:
      children:
        webservers:
          hosts:
            web-01.prod.example.com:
              ansible_user: ec2-user
            web-02.prod.example.com:
              ansible_user: ec2-user
        databases:
          hosts:
            db-primary.prod.example.com:
              ansible_port: 2222
            db-replica-1.prod.example.com: {}
    staging:
      children:
        webservers_staging:
          hosts:
            web-01.staging.example.com: {}

Dynamic Inventory: The Production Standard

Static inventories fail the moment your infrastructure becomes dynamic — auto-scaling groups that launch and terminate instances, ECS tasks, Kubernetes nodes. A static file you wrote on Monday is wrong by Tuesday.

Ansible solves this with dynamic inventory plugins. Instead of a file, you point Ansible at a script or plugin that queries your infrastructure provider and returns the current host list as JSON. Every major cloud has a first-party plugin:

amazon.aws.aws_ec2 — queries EC2, returns instances grouped by tags, region, VPC, ASG, etc.
google.cloud.gcp_compute — GCP Compute Engine instances
azure.azcollection.azure_rm — Azure VMs and VMSS
community.vmware.vmware_vm_inventory — VMware vSphere

# inventory/aws_ec2.yml — dynamic inventory for AWS EC2
# Requires: pip install boto3 botocore
# Collection:  ansible-galaxy collection install amazon.aws

plugin: amazon.aws.aws_ec2
regions:
  - us-east-1
  - eu-west-1
filters:
  instance-state-name: running
  tag:Environment: production          # only prod instances

# Automatically group hosts by these EC2 attributes
keyed_groups:
  - prefix: env
    key: tags.Environment              # groups: env_production, env_staging
  - prefix: role
    key: tags.Role                     # groups: role_web, role_db, role_cache
  - prefix: az
    key: placement.availability_zone   # groups: az_us_east_1a, az_eu_west_1b

# What to use as the host's address
hostnames:
  - private-ip-address                 # use private IP inside VPC
  # - public-ip-address                # use public IP when connecting from outside VPC

compose:
  ansible_host: private_ip_address
  ansible_user: "'ec2-user'"
  ansible_ssh_private_key_file: "'~/.ssh/prod-key.pem'"

With this file saved as inventory/aws_ec2.yml, running ansible-inventory -i inventory/ --list returns all running EC2 instances in us-east-1 and eu-west-1 tagged Environment=production, pre-grouped by their tags. No manual updates required when Auto Scaling adds or removes a node.

Production pitfall: Never mix the host_key_checking = False setting with static known hosts in CI pipelines — it silently breaks SSH verification and opens you to MITM attacks. The right production pattern is to use the AWS EC2 plugin with private-ip-address hostnames and run your Ansible control node inside the same VPC (on a bastion host or a CI runner in a private subnet). You then enable host key checking and pre-populate ~/.ssh/known_hosts during instance bootstrap via user-data, or accept keys on first connect over a trusted private network.

Groups, Host Vars, and Group Vars

Putting connection parameters inline in the inventory file does not scale. Ansible's variable directory structure solves this cleanly. When Ansible finds a directory called group_vars/ or host_vars/ next to the inventory, it automatically loads variable files from them.

# Recommended directory layout — this is the Ansible "best practices" structure
# used at scale inside Google, Spotify, and most large Ansible shops

inventory/
  hosts.yml               # or aws_ec2.yml for dynamic
  group_vars/
    all.yml               # vars that apply to EVERY host
    webservers.yml        # vars only for hosts in [webservers]
    databases.yml
    prod.yml              # vars only when targeting the [prod] group-of-groups
    staging.yml
  host_vars/
    web-01.prod.example.com.yml   # vars only for this specific host
    db-primary.prod.example.com.yml

# group_vars/all.yml
ansible_ssh_common_args: '-o StrictHostKeyChecking=no'
ntp_servers:
  - 169.254.169.123    # AWS Time Sync Service (preferred inside AWS)
  - time.cloudflare.com

# group_vars/webservers.yml
nginx_worker_processes: auto
nginx_worker_connections: 4096
app_port: 8080
deploy_user: deploy

# group_vars/databases.yml
pg_max_connections: 500
pg_shared_buffers: "4GB"
pg_work_mem: "64MB"
backup_s3_bucket: "mycompany-db-backups-prod"

# host_vars/db-primary.prod.example.com.yml
pg_replication_role: primary
pg_wal_level: replica
pg_max_wal_senders: 10

Variable Precedence: The Order That Saves You at 3 AM

Ansible has 22 levels of variable precedence. You do not need to memorize all of them, but you do need to know the most common collision points:

Lowest: role defaults (roles/<name>/defaults/main.yml) — intended to be overridden
Inventory group vars (group_vars/all.yml, then specific groups)
Inventory host vars (host_vars/<hostname>.yml)
Playbook group vars and host vars
Role vars (roles/<name>/vars/main.yml) — hard to override
Task vars (vars: key in a play)
Highest: extra vars passed with -e on the command line — always win

The most common production bug is a role's vars/main.yml silently overriding your group_vars/ because role vars sit above inventory vars in the precedence chain. Put configuration that operators should be able to customize in defaults/main.yml, never in vars/main.yml.

# Verify your inventory and variable resolution before running any playbook

# List all hosts in the inventory
ansible-inventory -i inventory/ --list

# Show what Ansible sees for a specific host (all vars merged and resolved)
ansible-inventory -i inventory/ --host web-01.prod.example.com

# Test connectivity to a group (ad-hoc ping)
ansible webservers -i inventory/ -m ping

# Test connectivity to a dynamic EC2 group (by tag)
ansible role_web -i inventory/aws_ec2.yml -m ping

# See which groups a host belongs to
ansible-inventory -i inventory/ --host web-01.prod.example.com | python3 -m json.tool

Pro practice: In large organizations, inventory is itself a service. Tools like Netbox (network source of truth) or Rundeck integrate with Ansible as dynamic inventory sources, so inventory is maintained by the network team in Netbox and Ansible reads it live. The Ansible custom inventory script protocol is simple: implement --list (returns all groups and hosts as JSON) and --host <hostname> (returns host vars). Any script that speaks this interface works as an Ansible inventory.

Summary

Ansible's agentless SSH design removes an entire category of operational complexity — no agent lifecycle to manage, no separate control plane to secure. The inventory is Ansible's single source of truth for your fleet: static INI/YAML files for fixed infrastructure, dynamic plugins for cloud environments where hosts come and go. Group and host variable files keep configuration out of the inventory and in version-controlled, human-readable YAML. And understanding variable precedence prevents the silent overrides that cause production incidents. With these foundations in place, you are ready to write your first ad-hoc commands and playbooks in the lessons that follow.