Modern infrastructure is defined and controlled through APIs. You provision EC2 instances via the AWS EC2 API, open pull requests via the GitHub REST API, page an on-call engineer via PagerDuty, and query metrics via the Datadog API. Every ops automation script you will ever write either calls an API or is triggered by one. This lesson makes you fluent in that work: how to call APIs reliably, how to authenticate properly, how to survive transient failures with retries, and how to consume paginated responses without exhausting memory.
The requests Library and First Principles
The requests library is the de-facto standard for HTTP in Python. Install it in your active venv, then make a minimal GET call:
pip install requests
import requests
# verify=True (TLS certificate validation) is the default — never disable it
response = requests.get("https://api.github.com/repos/torvalds/linux")
response.raise_for_status() # raises HTTPError for 4xx / 5xx responses
data = response.json() # deserialises the JSON body automatically
print(data["stargazers_count"])
raise_for_status() is the single most important habit. Without it, a 404 or 500 silently returns a Response object and your script proceeds as if nothing went wrong — only failing later with a confusing KeyError. Always call it immediately after every request.
Production pitfall — verify=False: You will find examples online that pass verify=False to skip TLS validation. In production this is a critical security vulnerability that opens your script to man-in-the-middle attacks. If you must talk to an internal API with a self-signed certificate, pass verify="/path/to/internal-ca.crt" instead. Never disable certificate validation entirely.
Authentication Patterns
The three authentication patterns you encounter most often in ops work are Bearer tokens, API keys in custom headers, and Basic Auth. Credentials must always come from the environment — never from source code.
Use a Session for repeated calls to the same host: A requests.Session() reuses the underlying TCP connection (HTTP keep-alive), shares cookies, and lets you set default headers and auth once. At scale, connection reuse cuts wall-clock time of a 100-request workflow by 30–60%. Set your auth headers on the Session, not on every individual call.
Retries with Exponential Backoff
Networks fail. Rate limiters fire. Upstream APIs return transient 503s. A script that crashes on the first failure is not production-ready. The professional pattern is to retry with exponential backoff: wait 1 s, then 2 s, then 4 s — with a small random component (jitter) so a fleet of parallel scripts does not all retry simultaneously and amplify the pressure on the server.
Wire urllib3.util.Retry into an HTTPAdapter and mount it on the Session. This handles retries at the transport layer — including TCP-level failures that never produce a Python response object at all.
import os, requests
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry
def build_session(token: str) -> requests.Session:
"""Return a Session with auth, retries, and timeouts pre-configured."""
retry_strategy = Retry(
total=5, # max attempts including the first
backoff_factor=1, # waits: 1s, 2s, 4s, 8s, 16s
status_forcelist=[429, 500, 502, 503, 504],
allowed_methods=["GET", "POST", "PUT", "DELETE", "PATCH"],
respect_retry_after_header=True, # honour Retry-After on 429
)
adapter = HTTPAdapter(max_retries=retry_strategy)
session = requests.Session()
session.mount("https://", adapter)
session.mount("http://", adapter)
session.headers.update({
"Authorization": f"Bearer {token}",
"Accept": "application/json",
})
session.timeout = (5, 30) # (connect_timeout_s, read_timeout_s)
return session
session = build_session(os.environ["API_TOKEN"])
resp = session.get("https://api.example.com/resources")
resp.raise_for_status()
print(resp.json())
Respect Retry-After on 429: When a server returns 429 Too Many Requests it usually includes a Retry-After header. Setting respect_retry_after_header=True (the default in modern urllib3) tells the adapter to sleep exactly as long as the server requests before retrying. This is the correct behaviour — not hammering the server again immediately and getting your IP temporarily blocked.
Pagination: Following All Pages Without Blowing Memory
Production APIs never return tens of thousands of records in one response. They paginate. The two most common styles are cursor-based pagination (modern, recommended by GitHub and Stripe) and page-number pagination (older, still ubiquitous). Both require looping until the API signals no more pages.
The full API client lifecycle: a Session with an HTTPAdapter handles retries and backoff transparently while the application layer loops through paginated cursor responses.
import os, requests
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry
from typing import Generator
def build_session(token: str) -> requests.Session:
retry = Retry(total=5, backoff_factor=1, status_forcelist=[429,500,502,503,504])
s = requests.Session()
s.mount("https://", HTTPAdapter(max_retries=retry))
s.headers.update({"Authorization": f"Bearer {token}", "Accept": "application/json"})
s.timeout = (5, 30)
return s
# --- Cursor-based pagination (GitHub, Stripe, Slack) ---
def iter_github_repos(session: requests.Session, org: str) -> Generator[dict, None, None]:
"""Yield every repository in an org, transparently following Link headers."""
url = f"https://api.github.com/orgs/{org}/repos"
params: dict = {"per_page": 100} # always request the maximum page size
while url:
resp = session.get(url, params=params)
resp.raise_for_status()
yield from resp.json()
# requests parses the Link header into resp.links automatically
url = resp.links.get("next", {}).get("url")
params = {} # the next URL already contains query parameters
# --- Page-number pagination (Jenkins, older REST APIs) ---
def iter_all_pages(session: requests.Session, base_url: str) -> Generator[dict, None, None]:
page = 1
while True:
resp = session.get(base_url, params={"page": page, "per_page": 100})
resp.raise_for_status()
items = resp.json()
if not items: # empty list signals the final page
break
yield from items
page += 1
session = build_session(os.environ["GITHUB_TOKEN"])
for repo in iter_github_repos(session, "myorg"):
print(repo["full_name"], repo["stargazers_count"])
Generators keep memory flat: Both pagination functions use yield from rather than building a list. A generator produces one page at a time so memory usage is bounded by the largest single page regardless of total result count. An org with 10,000 repositories in a list could easily consume hundreds of megabytes; the generator version stays constant.
Sending Data: POST, PUT, PATCH
Write operations — creating an incident, triggering a deployment, posting a Slack message — use POST or PUT with a JSON body. Pass a Python dict to the json= parameter and requests serialises it and sets Content-Type: application/json automatically:
# Trigger a PagerDuty incident via POST
payload = {
"incident": {
"type": "incident",
"title": "Disk usage above 90% on prod-db-01",
"service": {"id": os.environ["PD_SERVICE_ID"], "type": "service_reference"},
"urgency": "high",
"body": {
"type": "incident_body",
"details": "Automated alert from disk monitor script.",
},
}
}
resp = session.post("https://api.pagerduty.com/incidents", json=payload)
resp.raise_for_status()
incident_id = resp.json()["incident"]["id"]
print(f"Created incident {incident_id}")
Idempotency keys on write operations: Some APIs accept an Idempotency-Key header (Stripe, PagerDuty). Pass a deterministic value — such as a hash of the alert content plus the current UTC hour — so that if your script retries after a network timeout, the server deduplicates the request and does not create two incidents. This is the correct solution to the "my retry created a duplicate" problem — not disabling retries.
Timeouts and Structured Error Handling
The single most common reason ops scripts hang indefinitely in production is a missing timeout. Set both the connect timeout and the read timeout explicitly. A good default is (5, 30): five seconds to establish the TCP connection, thirty seconds to receive the full response body. Catch specific exception types so your error messages tell the operator exactly what went wrong:
import requests.exceptions as exc
try:
resp = session.get("https://api.example.com/endpoint")
resp.raise_for_status()
except exc.ConnectTimeout:
print("Could not reach the server within 5 s — check network / firewall")
raise SystemExit(1)
except exc.ReadTimeout:
print("Server connected but did not respond within 30 s")
raise SystemExit(1)
except exc.HTTPError as e:
print(f"HTTP {e.response.status_code}: {e.response.text[:200]}")
raise SystemExit(1)
except exc.RequestException as e:
# Catches ConnectionError, TooManyRedirects, and anything else requests raises
print(f"Network error: {e}")
raise SystemExit(1)