Working with JSON, YAML & Config
Working with JSON, YAML & Config
DevOps scripts live and die by their ability to read, validate, and emit structured data. Kubernetes manifests, Terraform variables, CI pipeline definitions, feature flags, and Ansible inventories are all YAML or JSON. Knowing how to handle these formats correctly — including the production failure modes — separates a reliable ops engineer from someone whose script silently corrupts a production config.
JSON: The Universal API Language
Python's json module is part of the standard library and covers 95 % of real-world JSON work. The key operations are json.loads() (string to dict), json.dumps() (dict to string), json.load() (file handle to dict), and json.dump() (dict to file handle).
rename() (or Python's Path.replace()). On Linux, same-filesystem renames are atomic at the kernel level.YAML: The Config Format of the Cloud-Native Stack
YAML is not in the standard library. The production-grade choice is PyYAML (import name yaml). Always use yaml.safe_load() — never yaml.load() without an explicit Loader argument, because the default loader can execute arbitrary Python code embedded in a YAML file, which is a critical remote-code-execution vector.
sort_keys=False to yaml.dump(). Kubernetes and Helm do not require a specific order, but humans reading diffs expect apiVersion to come before spec. Sorted keys make diffs noisy and code reviews harder.The YAML Norway Problem and Other Gotchas
YAML has several notorious parsing surprises that have caused real production outages. The most famous is the Norway Problem: YAML 1.1 (which PyYAML still uses by default) parses bare NO as boolean False. Country codes in Ansible inventories or environment lists can silently become False. Always quote strings that look like booleans or null: "NO", "yes", "null", "true", "on".
Other common traps: a bare integer key (123: value) becomes a Python int key, not a string — breaking downstream dict["123"] lookups. Octal literals (0777) are parsed as integers. And YAML timestamps (2024-01-15) become Python datetime.date objects.
Validating Config with jsonschema and Pydantic
Parsing a YAML file without validating it means your script will fail later with a confusing KeyError or TypeError deep inside business logic. Validate at the boundary — immediately after loading, before any processing. Two options dominate at big tech:
- jsonschema — validates any dict against a JSON Schema definition; works for both JSON and YAML data; minimal dependency.
- Pydantic v2 — defines models as Python classes; gives you typed attributes, default values, and rich error messages; preferred for complex configs and when you also need IDE autocompletion.
container > port) and a human-readable message. Ship that message to your logging system before exiting.Environment-Based Config: The Twelve-Factor Way
Production services should not read secrets from YAML files checked into git. The Twelve-Factor App pattern stores credentials, database URLs, and API keys in environment variables and reads config files only for non-secret, version-controllable settings. Python's os.environ and the python-dotenv package handle this cleanly.
os.environ["KEY"] (not .get()) for required secrets. If a required variable is missing, you want a loud KeyError at startup, not a silent empty string that causes a mysterious auth failure 200 requests later. Reserve .get("KEY", default) for genuinely optional settings.TOML: The Python Ecosystem's Own Config Format
Since Python 3.11, tomllib is in the standard library (read-only). It is the format of pyproject.toml and is increasingly used for tool configs. If you need to write TOML, install tomli-w.
The full decision framework: use JSON when talking to APIs or machines; use YAML for Kubernetes, Ansible, and CI pipelines (humans + machines); use TOML for Python project metadata and tool configs; use env vars for secrets and runtime overrides.