Lambda in Production
Lambda in Production
AWS Lambda is not a toy. Netflix, Amazon itself, Nordstrom, and hundreds of other companies route billions of production invocations through it every day. But moving from a working demo to a function that survives real traffic requires understanding four non-negotiable dimensions: runtimes, memory and CPU allocation, timeout strategy, and concurrency models. Getting any one of these wrong at scale means silent performance degradation, runaway costs, or hard outages at 2 AM.
Runtimes: What Runs Where
Lambda supports two categories of runtimes. Managed runtimes (Node.js 20/22, Python 3.12/3.13, Java 21, .NET 8, Ruby 3.3, Go via provided.al2023) are maintained by AWS — patches are applied automatically when a new runtime minor version ships. Custom runtimes via the provided.al2023 base layer let you bring any binary that implements the Lambda Runtime API bootstrap contract.
For production, the runtime choice drives three things: cold-start latency, available concurrency limits (all are equal), and your team's operational familiarity. Python and Node.js cold-start in ~100–400 ms; Java with class-data sharing (CDS) lands around 500–900 ms without SnapStart; Java 21 with SnapStart can reach sub-100 ms by restoring from a snapshot taken after JVM initialization. Go (via provided.al2023) cold-starts in ~10–50 ms and is the default choice for latency-critical control-plane functions at several hyperscalers.
static initializers if they carry state that must not be shared across restored snapshots.
In every runtime, the execution environment is a Firecracker microVM with an Amazon Linux 2023 root. The environment is reused across warm invocations, but you must treat it as ephemeral: anything written to /tmp (512 MB by default, up to 10 GB) may persist for the lifetime of the environment but disappears when the environment is recycled. Do not use /tmp as a durable store.
Memory, CPU, and the Right Sizing Problem
Lambda's resource model is deliberately simple: you configure one number — memory — from 128 MB to 10,240 MB in 1 MB increments. CPU is allocated proportionally: 1,769 MB of memory gives you exactly one full vCPU; 3,538 MB gives you two; 10,240 MB gives you six. There is no independent CPU knob.
This creates a right-sizing problem that trips up many teams. Increasing memory from 512 MB to 1,024 MB doubles your cost-per-GB-second but also roughly halves wall-clock duration for CPU-bound workloads, often resulting in the same or lower total cost while cutting p99 latency in half. The only way to find the optimum is to measure.
For I/O-bound functions (waiting on DynamoDB, S3, downstream APIs), the proportional CPU benefit largely disappears, and 512–1,024 MB is typically sufficient. For CPU-bound workloads — image processing, ML inference, cryptographic operations, zip/gzip in-memory — scaling memory to 3,008 MB or higher often makes economic sense.
Timeouts: A Contract, Not a Safety Net
Lambda timeouts range from 1 second to 15 minutes. Teams routinely set them to the maximum "just in case," which is one of the most expensive mistakes in serverless operations. A Lambda function that hangs waiting on a downstream service that is down will hold its concurrency slot for the full 15 minutes, blocking all other invocations from that reserved pool and silently accumulating GB-second charges.
The correct mental model is that a timeout is a contract: it defines the worst-case acceptable duration for the function's business logic. Set it to roughly 2–3× your p99 measured duration in production. If your DynamoDB read typically completes in 12 ms and p99 is 80 ms, a 2-second timeout is generous. If your function coordinates external calls, use the AWS SDK's built-in connectTimeout and socketTimeout (or equivalent per-SDK call timeout) to fail fast inside the function — do not rely on the Lambda timeout as the only circuit breaker.
onFailure destination so timed-out events do not vanish silently.
For long-running orchestration workloads (ETL, ML batch scoring, document processing), the right answer is usually Step Functions rather than a single 15-minute Lambda. Step Functions have a one-year execution window and express workflows handle up to 5 min. Each Lambda step stays short, retries are explicit, and state is visible in the console.
Concurrency Models
Lambda concurrency is the number of in-flight invocations at any moment. Every AWS account starts with a regional limit of 1,000 concurrent executions (soft limit; service quota increase requests are routinely approved to 10,000+). Three levers control how that pool is allocated:
- Unreserved concurrency: The default. All functions in a region share the pool. A spike on one function can starve another — the classic "noisy neighbor" in a monorepo Lambda deployment.
- Reserved concurrency: Assigns a hard cap to a specific function (e.g.,
ReservedConcurrentExecutions: 50). Invocations beyond the cap receive a throttle error (HTTP 429). Use it to protect downstream systems from being overwhelmed and to guarantee capacity for critical functions. - Provisioned concurrency: Pre-initializes N execution environments so they are warm and ready to serve requests with zero cold-start latency. You pay for provisioned environments whether invoked or not. The unit cost is roughly 0.015× the regular invocation cost per environment-hour, so it is economical only if traffic is sustained. Use Application Auto Scaling to scale provisioned concurrency with a scheduled action or a target-tracking policy tied to the
ProvisionedConcurrencyUtilizationmetric.
The following Terraform snippet expresses a production-grade Lambda configuration encapsulating all four dimensions discussed in this lesson:
Finally, prefer arm64 (Graviton2) over x86_64 for new functions whenever your runtime supports it. AWS charges approximately 20 % less per GB-second for arm64 invocations, and measured throughput for CPU-bound Python, Node.js, and Java workloads is equal or higher. The only reason not to use arm64 is a native dependency compiled for x86 that lacks an arm64 build — increasingly rare in 2025.
REPORT log lines on every invocation: Billed Duration, Memory Used, Init Duration (cold-start only), and Restore Duration (SnapStart only). Shipping these to CloudWatch Logs Insights or your OTEL collector and plotting the p50/p99/p999 of Billed Duration split by cold/warm is the single most valuable Lambda dashboard you can build. We cover this fully in Lesson 7 (Serverless Observability).