Python for DevOps Automation

Files, Paths & Subprocess

18 min Lesson 2 of 28

Files, Paths & Subprocess

Two capabilities define almost every real DevOps script: navigating the filesystem and shelling out to other programs. Whether you are parsing a service config, rotating log files, or wrapping a CLI tool in automation, you need to do these things safely, portably, and without surprises in production. This lesson covers the modern Python idioms for both.

Why pathlib Instead of os.path

Before Python 3.6 the standard approach was the os.path module — a collection of string functions that treated paths as plain text. pathlib (added in 3.4, idiomatic from 3.6 onward) gives you path objects that know they are paths. Concatenation uses the / operator, platform differences (Windows backslash vs POSIX forward slash) are handled automatically, and the object carries methods for every common operation.

from pathlib import Path

# Build paths safely — no manual os.path.join()
base = Path("/etc/myapp")
config = base / "config" / "settings.yaml"

print(config)           # /etc/myapp/config/settings.yaml
print(config.name)      # settings.yaml
print(config.stem)      # settings
print(config.suffix)    # .yaml
print(config.parent)    # /etc/myapp/config

# Common predicates
print(config.exists())
print(config.is_file())
print(config.is_dir())

# Iterate a directory tree
log_dir = Path("/var/log/nginx")
for log_file in log_dir.glob("*.log"):
    print(log_file)

# Recursive glob — find every .conf under /etc
for conf in Path("/etc").rglob("*.conf"):
    print(conf)

In production scripts always resolve paths to their canonical absolute form with Path(...).resolve(). This eliminates symlink ambiguity and ensures your script behaves identically whether invoked from the project root or from a cron job with a different working directory.

Reading and Writing Files Safely

Python's built-in open() paired with a with block is the standard pattern — the context manager guarantees the file handle is closed even if an exception occurs. For small config files (under a few MB) read_text() and write_text() on a Path object are even more concise.

from pathlib import Path

config_path = Path("/etc/myapp/settings.yaml")

# Read entire file as a string (UTF-8 by default)
raw = config_path.read_text(encoding="utf-8")

# Read line by line — preferred for large files
with config_path.open(encoding="utf-8") as fh:
    for line in fh:
        line = line.rstrip("\n")
        print(line)

# Write atomically — write to a temp file, then rename
# A rename on the same filesystem is atomic on Linux/macOS;
# a crash mid-write never leaves a half-written config file.
import tempfile, os

def write_atomic(path: Path, content: str) -> None:
    tmp_fd, tmp_path = tempfile.mkstemp(
        dir=path.parent, prefix=".tmp_"
    )
    try:
        with os.fdopen(tmp_fd, "w", encoding="utf-8") as fh:
            fh.write(content)
        os.replace(tmp_path, path)   # atomic on POSIX
    except Exception:
        os.unlink(tmp_path)
        raise

write_atomic(config_path, raw.replace("debug: true", "debug: false"))

Never use path.write_text(content) directly on a live config file in production. If the process is killed mid-write, the file is truncated and the service that reads it will fail to start. The atomic temp-file-then-rename pattern shown above is the correct approach used by tools like systemd, nginx, and most package managers.

Running External Commands with subprocess

DevOps scripts constantly invoke CLI tools: git, kubectl, terraform, aws, docker. Python's subprocess module is the right way to do this. The older os.system() and commands module are deprecated; never use them.

The two main entry points are subprocess.run() for one-shot commands and subprocess.Popen() for streaming or interactive processes. Start with run().

import subprocess

# --- Safe: pass commands as a list, never as a shell string ---
result = subprocess.run(
    ["git", "rev-parse", "--short", "HEAD"],
    capture_output=True,   # stdout and stderr captured, not printed
    text=True,             # decode bytes to str automatically
    check=True,            # raise CalledProcessError on non-zero exit
)
commit_sha = result.stdout.strip()
print(f"Current commit: {commit_sha}")

# --- Checking exit code manually instead of check=True ---
result = subprocess.run(
    ["systemctl", "is-active", "--quiet", "nginx"],
    capture_output=True, text=True
)
if result.returncode == 0:
    print("nginx is running")
else:
    print("nginx is NOT running")

# --- Passing environment variables ---
import os
env = {**os.environ, "KUBECONFIG": "/home/deploy/.kube/config"}
result = subprocess.run(
    ["kubectl", "get", "nodes", "-o", "wide"],
    capture_output=True, text=True, check=True, env=env
)
print(result.stdout)

# --- Streaming output for long-running commands ---
with subprocess.Popen(
    ["terraform", "apply", "-auto-approve"],
    stdout=subprocess.PIPE, stderr=subprocess.STDOUT,
    text=True
) as proc:
    for line in proc.stdout:
        print(line, end="")   # real-time output
    proc.wait()
    if proc.returncode != 0:
        raise RuntimeError(f"terraform failed (exit {proc.returncode})")

How subprocess.run() forks a child process, captures its output, and returns a CompletedProcess object to your script.

Shell Injection: The Critical Security Rule

The single most important rule when using subprocess in DevOps scripts: never build a command string from untrusted input and pass shell=True. The shell=True flag passes your string to /bin/sh -c, which means any shell metacharacter (;, |, $(), &&) in user-supplied or environment-derived data becomes an injection point.

# WRONG — shell injection risk if branch_name comes from outside
branch_name = "main; rm -rf /"
result = subprocess.run(
    f"git checkout {branch_name}",
    shell=True,    # NEVER do this with external input
    capture_output=True
)

# CORRECT — list form; the shell is never involved
result = subprocess.run(
    ["git", "checkout", branch_name],  # branch_name is just an argument
    capture_output=True, text=True, check=True
)

# Acceptable use of shell=True: only for shell built-ins or pipelines
# with FULLY HARDCODED strings (no user data anywhere in the string)
result = subprocess.run(
    "df -h | grep /dev/sda",
    shell=True, capture_output=True, text=True
)

At big-tech scale, automation scripts are often triggered by CI/CD systems, webhooks, or operator input. A shell injection in a deployment script running as root is a full server compromise. Treat the list-form of subprocess.run() as the default, and treat shell=True as a code-review red flag unless the string is a hardcoded literal.

Handling Errors and Timeouts

Production ops scripts must handle failure gracefully. Always set a timeout so a hung external command does not block your pipeline indefinitely. Catch subprocess.CalledProcessError to log diagnostics and decide whether to retry, alert, or abort.

import subprocess, logging

log = logging.getLogger(__name__)

def run_kubectl(args: list[str], timeout: int = 30) -> str:
    """Run kubectl safely; return stdout; raise on failure."""
    cmd = ["kubectl"] + args
    try:
        result = subprocess.run(
            cmd,
            capture_output=True,
            text=True,
            check=True,
            timeout=timeout,
        )
        return result.stdout
    except subprocess.CalledProcessError as exc:
        log.error(
            "kubectl failed (exit %d): %s",
            exc.returncode,
            exc.stderr.strip(),
        )
        raise
    except subprocess.TimeoutExpired:
        log.error("kubectl timed out after %ds: %s", timeout, cmd)
        raise

# Usage
nodes = run_kubectl(["get", "nodes", "-o", "name"])
print(nodes)

Summary

Use pathlib.Path for all filesystem work — it is portable, readable, and avoids string-joining bugs.
Write configs atomically via temp-file-then-rename to protect running services.
Use subprocess.run() with a list of arguments, capture_output=True, text=True, check=True, and a timeout.
Never pass user-derived data through shell=True.
Catch CalledProcessError and TimeoutExpired; log stderr for every failure.