Testing Automation Code
A deployment script that runs successfully in staging and silently corrupts production is worse than a script that fails loudly on the first run. Ops automation code interacts with real infrastructure — cloud APIs, databases, file systems, network services — making it uniquely dangerous to ship untested. At companies like Google and Netflix, infrastructure code undergoes the same code-review and test-coverage gates as product code. This lesson covers how to apply professional testing discipline to Python automation: pytest fundamentals, mocking external dependencies, and strategies for scripts that touch real infrastructure.
Why Ops Code Is Hard to Test
The core challenge is side effects. Every interesting line in an ops script does something to the world outside the process: it calls the AWS API, writes a file, runs a subprocess, or sends a Slack message. A naive test that actually calls those APIs would cost money, require credentials, leave behind cloud resources, and run for minutes. The solution is to isolate side effects with mocks — objects that pretend to be the real dependency and record what was called on them. The test then asserts on the mock's call history rather than on the cloud's state.
Setting Up pytest
pytest is the de-facto standard test runner for Python automation code. It discovers tests automatically, produces clear failure output, and has a rich plugin ecosystem. Install it alongside your project dependencies:
# Install pytest and supporting libraries into your venv
pip install pytest pytest-mock moto[s3,ec2] responses
# Recommended project layout
my-ops-tool/
├── src/
│ └── ops/
│ ├── __init__.py
│ ├── s3.py
│ ├── ec2.py
│ └── deploy.py
├── tests/
│ ├── conftest.py # shared fixtures
│ ├── unit/
│ │ ├── test_s3.py
│ │ └── test_deploy.py
│ └── integration/
│ └── test_ec2_integration.py
├── pyproject.toml
└── pytest.ini
The pytest.ini (or [tool.pytest.ini_options] in pyproject.toml) controls discovery and output:
# pytest.ini
[pytest]
testpaths = tests
python_files = test_*.py
python_classes = Test*
python_functions = test_*
addopts = -v --tb=short --strict-markers
markers =
unit: fast, no external calls
integration: may call real cloud APIs (slow, needs creds)
smoke: a minimal end-to-end sanity check
Key idea: Mark every test with @pytest.mark.unit or @pytest.mark.integration. In CI, run only pytest -m unit on every pull request (fast, no secrets needed). Reserve -m integration for a nightly pipeline that has cloud credentials injected via secrets manager. This pattern is used verbatim at many large tech shops.
Mocking External Calls with pytest-mock and unittest.mock
The pytest-mock plugin exposes a mocker fixture that wraps Python's standard unittest.mock library. The two most important objects are MagicMock (a general-purpose mock) and patch (a context manager / decorator that replaces a name in a module's namespace for the duration of a test).
# src/ops/s3.py — the module under test
import boto3
def list_large_objects(bucket: str, threshold_mb: int = 100) -> list[dict]:
"""Return objects larger than threshold_mb from an S3 bucket."""
s3 = boto3.client("s3")
paginator = s3.get_paginator("list_objects_v2")
large = []
for page in paginator.paginate(Bucket=bucket):
for obj in page.get("Contents", []):
if obj["Size"] > threshold_mb * 1024 * 1024:
large.append({"key": obj["Key"], "size_mb": obj["Size"] // (1024 * 1024)})
return large
# tests/unit/test_s3.py
import pytest
from ops.s3 import list_large_objects
@pytest.mark.unit
def test_list_large_objects_filters_correctly(mocker):
# Build a fake paginator that returns a single page
fake_page = {
"Contents": [
{"Key": "logs/small.log", "Size": 5 * 1024 * 1024}, # 5 MB — below threshold
{"Key": "dumps/large.sql", "Size": 500 * 1024 * 1024}, # 500 MB — above threshold
]
}
mock_paginator = mocker.MagicMock()
mock_paginator.paginate.return_value = [fake_page]
mock_s3 = mocker.MagicMock()
mock_s3.get_paginator.return_value = mock_paginator
# Patch boto3.client so the function never talks to AWS
mocker.patch("ops.s3.boto3.client", return_value=mock_s3)
result = list_large_objects("my-bucket", threshold_mb=100)
assert len(result) == 1
assert result[0]["key"] == "dumps/large.sql"
assert result[0]["size_mb"] == 500
# Verify we asked for the right paginator
mock_s3.get_paginator.assert_called_once_with("list_objects_v2")
mock_paginator.paginate.assert_called_once_with(Bucket="my-bucket")
Pro practice — patch where it is USED, not where it is defined. The code under test imports boto3 via ops.s3.boto3. You must patch ops.s3.boto3.client, not boto3.client. Patching the wrong namespace is the single most common mock bug in ops testing.
Mocking HTTP with responses
When a script uses the requests library to call REST APIs (Datadog, PagerDuty, GitHub, etc.), use the responses library to intercept those calls at the transport layer without actually touching the network:
# src/ops/pagerduty.py
import requests
PAGERDUTY_URL = "https://api.pagerduty.com"
def get_oncall_user(schedule_id: str, token: str) -> str:
"""Return the email of the current on-call engineer."""
resp = requests.get(
f"{PAGERDUTY_URL}/oncalls",
headers={"Authorization": f"Token token={token}"},
params={"schedule_ids[]": schedule_id, "limit": 1},
timeout=10,
)
resp.raise_for_status()
data = resp.json()
return data["oncalls"][0]["user"]["email"]
# tests/unit/test_pagerduty.py
import responses as responses_lib
import pytest
from ops.pagerduty import get_oncall_user
@pytest.mark.unit
@responses_lib.activate
def test_get_oncall_user_returns_email():
responses_lib.add(
responses_lib.GET,
"https://api.pagerduty.com/oncalls",
json={"oncalls": [{"user": {"email": "sre@example.com"}}]},
status=200,
)
email = get_oncall_user("SCH123", "fake-token")
assert email == "sre@example.com"
@pytest.mark.unit
@responses_lib.activate
def test_get_oncall_user_raises_on_403():
responses_lib.add(
responses_lib.GET,
"https://api.pagerduty.com/oncalls",
status=403,
)
import requests
with pytest.raises(requests.exceptions.HTTPError):
get_oncall_user("SCH123", "bad-token")
Infrastructure-Aware Testing with moto
For AWS specifically, moto is an indispensable library that spins up fake AWS service endpoints in-process. Unlike generic mocks, moto actually enforces AWS semantics — S3 key constraints, IAM policy evaluation, EC2 instance state transitions. This makes your tests far more realistic than hand-rolled mocks:
# tests/unit/test_s3_moto.py
import boto3
import pytest
from moto import mock_aws
from ops.s3 import list_large_objects
@pytest.mark.unit
@mock_aws
def test_list_large_objects_with_moto():
# moto intercepts all boto3 calls inside this scope
s3 = boto3.client("s3", region_name="us-east-1")
s3.create_bucket(Bucket="test-bucket")
# Upload a small and a large object
s3.put_object(Bucket="test-bucket", Key="small.txt", Body=b"x" * (10 * 1024 * 1024)) # 10 MB
s3.put_object(Bucket="test-bucket", Key="large.bin", Body=b"x" * (200 * 1024 * 1024)) # 200 MB
result = list_large_objects("test-bucket", threshold_mb=100)
assert len(result) == 1
assert result[0]["key"] == "large.bin"
Production pitfall — never hard-code real credentials in tests. Every test file that imports boto3 should either use moto (which ignores credentials) or rely on environment variables. A common CI failure is a test that passes locally (because the developer has real AWS credentials in ~/.aws/credentials) but fails in CI because those credentials are absent. Always verify your unit test suite passes with AWS_ACCESS_KEY_ID=fake AWS_SECRET_ACCESS_KEY=fake pytest -m unit before merging.
Fixtures: Shared Setup and Teardown
pytest fixtures are functions decorated with @pytest.fixture that provide reusable context to tests. Put shared fixtures in tests/conftest.py — pytest discovers that file automatically across the entire test suite:
# tests/conftest.py
import os
import pytest
import boto3
from moto import mock_aws
@pytest.fixture(scope="function")
def aws_credentials():
"""Ensure no real AWS calls escape during unit tests."""
os.environ["AWS_ACCESS_KEY_ID"] = "testing"
os.environ["AWS_SECRET_ACCESS_KEY"] = "testing"
os.environ["AWS_DEFAULT_REGION"] = "us-east-1"
yield
# teardown: nothing to clean (env vars reset by fixture isolation)
@pytest.fixture
def s3_bucket(aws_credentials):
"""Provide a moto-backed S3 bucket pre-populated for tests."""
with mock_aws():
s3 = boto3.client("s3", region_name="us-east-1")
s3.create_bucket(Bucket="ci-test-bucket")
yield "ci-test-bucket"
# Any test can now request 's3_bucket' as an argument:
# def test_something(s3_bucket):
# result = list_large_objects(s3_bucket, threshold_mb=50)
# assert ...
Testing subprocess-Heavy Scripts
Many ops scripts shell out via subprocess.run. Mock at the subprocess.run level, not at the shell level, to keep tests fast and portable:
# src/ops/deploy.py
import subprocess
def restart_service(host: str, service: str) -> bool:
result = subprocess.run(
["ssh", host, f"sudo systemctl restart {service}"],
capture_output=True, text=True, timeout=30,
)
return result.returncode == 0
# tests/unit/test_deploy.py
import pytest
from unittest.mock import patch, MagicMock
from ops.deploy import restart_service
@pytest.mark.unit
def test_restart_service_success(mocker):
mock_run = mocker.patch("ops.deploy.subprocess.run")
mock_run.return_value = MagicMock(returncode=0)
assert restart_service("web-01.prod", "nginx") is True
mock_run.assert_called_once_with(
["ssh", "web-01.prod", "sudo systemctl restart nginx"],
capture_output=True, text=True, timeout=30,
)
@pytest.mark.unit
def test_restart_service_failure(mocker):
mock_run = mocker.patch("ops.deploy.subprocess.run")
mock_run.return_value = MagicMock(returncode=1)
assert restart_service("web-01.prod", "nginx") is False
Measuring Coverage
Line coverage is a floor, not a ceiling. A 95% coverage report means nothing if the 5% uncovered lines are the error-handling branches that execute only when the cloud API returns a 500. Use pytest-cov to measure coverage and enforce a minimum gate in CI:
pip install pytest-cov
# Run with coverage report
pytest -m unit --cov=src/ops --cov-report=term-missing --cov-fail-under=85
# In pyproject.toml — enforce the gate declaratively
[tool.coverage.run]
source = ["src/ops"]
omit = ["*/__init__.py"]
[tool.coverage.report]
fail_under = 85
show_missing = true
Pro practice — test error paths first. When prioritising coverage in ops code, write tests for the except blocks, the if resp.status_code != 200 branches, and the timeout handlers before you test the happy path. In production, the happy path runs 99% of the time; the error path is the 1% where the script being wrong causes an incident.
By applying these patterns — pytest fixtures, mock isolation, moto for AWS semantics, responses for HTTP, and coverage gates — your automation scripts become auditable, refactorable, and safe to hand off to the next engineer on the team. That is the standard every serious SRE team holds their infrastructure code to.