Concurrency for Ops: Threads & Async
Concurrency for Ops: Threads & Async
A DevOps script that checks the health of 200 services sequentially waits for each HTTP response before firing the next request. On a 100 ms average latency, that is 20 seconds of wall-clock time — almost entirely spent blocked on network I/O, with your CPU idle. Add concurrency and the same work takes under a second. This lesson teaches the two concurrency models Python gives you, when each is the right choice, and the production pitfalls that trip even experienced engineers.
The Two Models: Threading vs. asyncio
Python's Global Interpreter Lock (GIL) means that at any instant only one thread executes Python bytecode. For CPU-bound work (number crunching, compression) this makes threads useless for parallelism — use multiprocessing or subprocesses instead. For I/O-bound work — the overwhelming majority of ops tasks (HTTP calls, SDK calls, DNS lookups, subprocess waits) — the GIL releases while the thread blocks on I/O, so multiple threads run concurrently in practice.
- Threading (
concurrent.futures.ThreadPoolExecutor): Each unit of work runs in its own OS thread. Simple to adopt because existing synchronous code (boto3 calls,requests.get) works with zero changes. Best for fan-out work where you call many independent I/O operations and collect the results. - asyncio: A single-threaded event loop switches between coroutines whenever one is awaiting I/O. Lower overhead per task (no OS thread stack), but requires async-native libraries (
aiohttp,aiobotocore). Best when you have thousands of concurrent connections or need fine-grained cancellation and timeouts.
ThreadPoolExecutor with a bounded pool (8–32 workers) is the right default. It is readable, debuggable, and requires no async-aware libraries. Reach for asyncio when you need to sustain thousands of simultaneous open connections — a load generator, a real-time log tailer, or a WebSocket-based monitoring agent.Parallel API Calls with ThreadPoolExecutor
The canonical pattern: submit all tasks, collect futures, iterate over completions. The as_completed function yields futures as they finish rather than in submission order, letting you process results the moment they arrive.
future.result() exception catching as a safety net for truly unexpected bugs, not expected network errors.Choosing the Right Pool Size
Setting max_workers is a tuning decision, not a "bigger is better" dial. Too small and you leave parallelism on the table. Too large and you exhaust file descriptors, hit per-host TCP connection limits, or get rate-limited by the remote API.
- For external HTTP APIs: Most SaaS APIs enforce per-client rate limits (e.g., 100 req/s). Start at 10–20 workers and instrument the 429 rate. Back off exponentially on 429s rather than adding more threads.
- For AWS SDK calls (boto3): The default connection pool per
Sessionis 10. With 50 threads sharing one session you will getConnectionErrorpool exhaustion. Either setmax_pool_connectionsinbotocore.config.Configto match your worker count, or create one session per thread usingthreading.local(). - For internal microservices / health checks: Workers can be 50–100 if targets are on a low-latency internal network and you own the target services.
asyncio for High-Fan-Out Ops Work
When your fan-out target count reaches the hundreds-to-thousands range, OS thread overhead becomes meaningful. Each thread consumes ~8 MB of stack by default; 1,000 threads = 8 GB RAM before you write a single byte of application data. asyncio coroutines are cheap — thousands coexist in a few megabytes. The trade-off: every library in the call chain must be async-native.
asyncio.gather(*[task for task in 10_000_tasks]) creates all 10,000 coroutines simultaneously. Even though coroutines are lightweight, the remote services see a thundering herd of connections. Always throttle with asyncio.Semaphore and set a matching limit on TCPConnector. Without this, you will hit kernel-level EMFILE (too many open files) or get banned by the upstream service.Mixing boto3 with Thread Pools
boto3 is not async-native, so the correct pattern for parallel AWS API calls is ThreadPoolExecutor. One critical detail: boto3 Session objects are not thread-safe. A single shared session with a thread-local client is the standard solution used inside AWS tooling itself.
Timeouts and Cancellation
Any concurrent ops script must have a wall-clock deadline. A single slow or hung target must not block the entire run indefinitely. The patterns differ between the two models:
- ThreadPoolExecutor: Use
concurrent.futures.wait(fs, timeout=30)to get a(done, not_done)pair after 30 seconds. Callfuture.cancel()on thenot_doneset — note that cancellation only works for tasks that have not yet started executing. For tasks already running, the only safe approach is cooperative: pass a threading Event to the worker and check it periodically. - asyncio: Wrap any coroutine with
asyncio.wait_for(coro, timeout=5.0). Raisesasyncio.TimeoutErrorand cleanly cancels the underlying task. For a batch:asyncio.gather(*tasks, return_exceptions=True)ensures all results (including exceptions) are collected without one failure aborting the rest.
signal.alarm on Linux or asyncio.wait_for around the entire fan_out call) in addition to per-request timeouts. This prevents the script from running forever if your concurrency primitive itself has a bug — a scenario more common than you might expect during network partitions.When to Avoid Concurrency
Concurrency adds debugging surface. A sequential script with clear logging is easier to operate than a concurrent one. Before reaching for threads or asyncio, ask:
- Is the total sequential runtime actually a problem? If a nightly batch finishes in 4 minutes, adding concurrency for the sake of it introduces risk with no user-visible benefit.
- Does the target service have rate limits that make concurrency counterproductive? Hitting a 10 req/s limit with 50 threads will just produce a flood of 429s and retries.
- Is the operation stateful in a way that makes concurrent mutations dangerous? Concurrent writes to the same S3 key, the same database row, or the same Kubernetes resource require distributed locking — a much harder problem than the one you started with.