Configuration Management with Ansible

Ansible Architecture & Inventory

18 min Lesson 2 of 30

Ansible Architecture & Inventory

Before you can manage a single server with Ansible, you need to understand how Ansible reaches that server, what it knows about your fleet, and how you organize hundreds or thousands of hosts into logical groups. This lesson covers all three: the agentless SSH architecture, static and dynamic inventories, and the host and group variable system. Get these right and every playbook you ever write will work. Get them wrong and you will spend your career chasing connection errors and variable conflicts.

The Agentless Architecture: Why It Matters

Almost every other configuration management system — Chef, Puppet, SaltStack in agent mode — requires you to install and maintain a daemon on every managed host. That daemon polls a central server, applies configuration, and reports back. The operational burden is real: the agent itself must be patched, must stay running, and can fail in ways that block configuration from being applied at all.

Ansible is fundamentally different. There is no agent. The control node (your laptop, a CI runner, or a bastion host) connects to each managed node over SSH (or WinRM for Windows), pushes a small Python script called a module, executes it, collects the result, and removes the script. The managed node needs only:

  • An SSH daemon (sshd) listening on a reachable port
  • Python 3.x in the standard PATH (Python 2.7 still works but is EOL)
  • A user account the control node can authenticate as

That is it. No firewall exceptions for outbound agent traffic, no certificate rotation, no "my agent is stuck" incidents at 2 AM.

Key idea: Ansible's push model means the control node initiates every connection. The managed node is always passive. This also means Ansible has no persistent knowledge of host state between runs — every play gathers fresh facts. If you need continuous drift detection, pair Ansible with a tool like AWS Config or Open Policy Agent, rather than trying to poll with cron.
Ansible Agentless SSH Architecture Control Node ansible / ansible-playbook Inventory + Playbooks SSH private key SSH :22 SSH :22 SSH :22 web-01 sshd + Python 3 no agent web-02 sshd + Python 3 no agent db-01 sshd + Python 3 no agent Module Execution 1. Copy Python module to /tmp 2. Execute + collect JSON result 3. Delete module from host result
Ansible pushes Python modules over SSH, executes them, collects the JSON result, and cleans up — no daemon required on managed nodes.

SSH Connection Tuning for Production Scale

By default, Ansible opens a new SSH connection for every task on every host. At ten hosts that is fine. At five hundred it becomes a bottleneck. Two settings change this completely:

  • SSH multiplexing (ControlMaster): Ansible re-uses an existing SSH connection for multiple tasks rather than re-authenticating each time. Enabled automatically when ssh_args includes the right flags.
  • Pipelining: Eliminates the write-module-to-disk, execute, delete cycle. Instead, Ansible pipes the module directly over the existing SSH connection. Requires requiretty to be disabled in /etc/sudoers on the target (the default in modern Linux).
# ansible.cfg — project-level configuration file # Place this in the same directory as your playbooks [defaults] inventory = ./inventory # default inventory path remote_user = ansible # OS user on managed nodes private_key_file = ~/.ssh/id_ed25519 # SSH key for auth host_key_checking = False # disable for ephemeral cloud hosts forks = 20 # parallel connections (default 5) gathering = smart # cache facts; only gather when needed fact_caching = jsonfile fact_caching_connection = /tmp/ansible_facts fact_caching_timeout = 3600 # seconds [ssh_connection] pipelining = True # biggest single performance win ssh_args = -o ControlMaster=auto -o ControlPersist=60s -o StrictHostKeyChecking=no
Pro practice: Always commit an ansible.cfg at the root of your Ansible repository. This makes the project self-contained — a new engineer clones the repo and runs ansible-playbook site.yml without needing to know any global configuration. Ansible loads config files in this priority order: ANSIBLE_CONFIG env var → ./ansible.cfg~/.ansible.cfg/etc/ansible/ansible.cfg. The project-local file always wins.

Inventories: Teaching Ansible About Your Fleet

The inventory is Ansible's source of truth for what hosts exist and how to reach them. It is the most important file you will write. Every ad-hoc command and every playbook starts by resolving the inventory.

Static Inventory (INI and YAML)

The simplest inventory is a plain INI file. Hosts are listed by IP or hostname, grouped under [group-name] headers. Groups can contain other groups with the :children suffix.

# inventory/hosts.ini — a production-realistic static inventory [webservers] web-01.prod.example.com ansible_user=ec2-user ansible_port=22 web-02.prod.example.com ansible_user=ec2-user ansible_port=22 web-03.prod.example.com ansible_user=ec2-user ansible_port=22 [databases] db-primary.prod.example.com ansible_user=ec2-user ansible_port=2222 db-replica-1.prod.example.com db-replica-2.prod.example.com [cache] redis-01.prod.example.com redis-02.prod.example.com # Group of groups — matches all hosts above [prod:children] webservers databases cache # Staging environment [webservers_staging] web-01.staging.example.com web-02.staging.example.com [staging:children] webservers_staging # Variables that apply to ALL hosts in a group (prefer group_vars/ files instead) [webservers:vars] http_port=80 max_clients=200

The INI format works well for small fleets. For larger, more complex inventories the YAML format is preferred because it supports nested structures without ambiguity:

# inventory/hosts.yml — equivalent YAML inventory all: children: prod: children: webservers: hosts: web-01.prod.example.com: ansible_user: ec2-user web-02.prod.example.com: ansible_user: ec2-user databases: hosts: db-primary.prod.example.com: ansible_port: 2222 db-replica-1.prod.example.com: {} staging: children: webservers_staging: hosts: web-01.staging.example.com: {}

Dynamic Inventory: The Production Standard

Static inventories fail the moment your infrastructure becomes dynamic — auto-scaling groups that launch and terminate instances, ECS tasks, Kubernetes nodes. A static file you wrote on Monday is wrong by Tuesday.

Ansible solves this with dynamic inventory plugins. Instead of a file, you point Ansible at a script or plugin that queries your infrastructure provider and returns the current host list as JSON. Every major cloud has a first-party plugin:

  • amazon.aws.aws_ec2 — queries EC2, returns instances grouped by tags, region, VPC, ASG, etc.
  • google.cloud.gcp_compute — GCP Compute Engine instances
  • azure.azcollection.azure_rm — Azure VMs and VMSS
  • community.vmware.vmware_vm_inventory — VMware vSphere
# inventory/aws_ec2.yml — dynamic inventory for AWS EC2 # Requires: pip install boto3 botocore # Collection: ansible-galaxy collection install amazon.aws plugin: amazon.aws.aws_ec2 regions: - us-east-1 - eu-west-1 filters: instance-state-name: running tag:Environment: production # only prod instances # Automatically group hosts by these EC2 attributes keyed_groups: - prefix: env key: tags.Environment # groups: env_production, env_staging - prefix: role key: tags.Role # groups: role_web, role_db, role_cache - prefix: az key: placement.availability_zone # groups: az_us_east_1a, az_eu_west_1b # What to use as the host's address hostnames: - private-ip-address # use private IP inside VPC # - public-ip-address # use public IP when connecting from outside VPC compose: ansible_host: private_ip_address ansible_user: "'ec2-user'" ansible_ssh_private_key_file: "'~/.ssh/prod-key.pem'"

With this file saved as inventory/aws_ec2.yml, running ansible-inventory -i inventory/ --list returns all running EC2 instances in us-east-1 and eu-west-1 tagged Environment=production, pre-grouped by their tags. No manual updates required when Auto Scaling adds or removes a node.

Production pitfall: Never mix the host_key_checking = False setting with static known hosts in CI pipelines — it silently breaks SSH verification and opens you to MITM attacks. The right production pattern is to use the AWS EC2 plugin with private-ip-address hostnames and run your Ansible control node inside the same VPC (on a bastion host or a CI runner in a private subnet). You then enable host key checking and pre-populate ~/.ssh/known_hosts during instance bootstrap via user-data, or accept keys on first connect over a trusted private network.

Groups, Host Vars, and Group Vars

Putting connection parameters inline in the inventory file does not scale. Ansible's variable directory structure solves this cleanly. When Ansible finds a directory called group_vars/ or host_vars/ next to the inventory, it automatically loads variable files from them.

# Recommended directory layout — this is the Ansible "best practices" structure # used at scale inside Google, Spotify, and most large Ansible shops inventory/ hosts.yml # or aws_ec2.yml for dynamic group_vars/ all.yml # vars that apply to EVERY host webservers.yml # vars only for hosts in [webservers] databases.yml prod.yml # vars only when targeting the [prod] group-of-groups staging.yml host_vars/ web-01.prod.example.com.yml # vars only for this specific host db-primary.prod.example.com.yml # group_vars/all.yml ansible_ssh_common_args: '-o StrictHostKeyChecking=no' ntp_servers: - 169.254.169.123 # AWS Time Sync Service (preferred inside AWS) - time.cloudflare.com # group_vars/webservers.yml nginx_worker_processes: auto nginx_worker_connections: 4096 app_port: 8080 deploy_user: deploy # group_vars/databases.yml pg_max_connections: 500 pg_shared_buffers: "4GB" pg_work_mem: "64MB" backup_s3_bucket: "mycompany-db-backups-prod" # host_vars/db-primary.prod.example.com.yml pg_replication_role: primary pg_wal_level: replica pg_max_wal_senders: 10

Variable Precedence: The Order That Saves You at 3 AM

Ansible has 22 levels of variable precedence. You do not need to memorize all of them, but you do need to know the most common collision points:

  1. Lowest: role defaults (roles/<name>/defaults/main.yml) — intended to be overridden
  2. Inventory group vars (group_vars/all.yml, then specific groups)
  3. Inventory host vars (host_vars/<hostname>.yml)
  4. Playbook group vars and host vars
  5. Role vars (roles/<name>/vars/main.yml) — hard to override
  6. Task vars (vars: key in a play)
  7. Highest: extra vars passed with -e on the command line — always win

The most common production bug is a role's vars/main.yml silently overriding your group_vars/ because role vars sit above inventory vars in the precedence chain. Put configuration that operators should be able to customize in defaults/main.yml, never in vars/main.yml.

# Verify your inventory and variable resolution before running any playbook # List all hosts in the inventory ansible-inventory -i inventory/ --list # Show what Ansible sees for a specific host (all vars merged and resolved) ansible-inventory -i inventory/ --host web-01.prod.example.com # Test connectivity to a group (ad-hoc ping) ansible webservers -i inventory/ -m ping # Test connectivity to a dynamic EC2 group (by tag) ansible role_web -i inventory/aws_ec2.yml -m ping # See which groups a host belongs to ansible-inventory -i inventory/ --host web-01.prod.example.com | python3 -m json.tool
Pro practice: In large organizations, inventory is itself a service. Tools like Netbox (network source of truth) or Rundeck integrate with Ansible as dynamic inventory sources, so inventory is maintained by the network team in Netbox and Ansible reads it live. The Ansible custom inventory script protocol is simple: implement --list (returns all groups and hosts as JSON) and --host <hostname> (returns host vars). Any script that speaks this interface works as an Ansible inventory.

Summary

Ansible's agentless SSH design removes an entire category of operational complexity — no agent lifecycle to manage, no separate control plane to secure. The inventory is Ansible's single source of truth for your fleet: static INI/YAML files for fixed infrastructure, dynamic plugins for cloud environments where hosts come and go. Group and host variable files keep configuration out of the inventory and in version-controlled, human-readable YAML. And understanding variable precedence prevents the silent overrides that cause production incidents. With these foundations in place, you are ready to write your first ad-hoc commands and playbooks in the lessons that follow.

ES
Edrees Salih
1 hour ago

We are still cooking the magic in the way!