Cold Starts & Performance
Cold Starts & Performance
In a traditional long-running service, the runtime overhead of starting your process — loading the JVM, initialising the Spring context, pulling secrets — is paid once at deploy time and then amortised over millions of requests. In a serverless function, that overhead can be paid on every invocation that hits a cold execution environment. Understanding exactly where that latency comes from, and knowing when and how to mitigate it, is one of the most practically consequential skills for operating Lambda at scale.
Cold Start Anatomy: What Actually Happens
When Lambda decides it needs a new execution environment for your function, it executes a fixed sequence of operations. Each has its own latency budget:
- Hypervisor slot allocation (~1–10 ms): Lambda runs on Firecracker micro-VMs. The control plane allocates a MicroVM slot. This is AWS-internal and you have zero influence over it.
- Runtime bootstrap (~50–500 ms for managed runtimes): The runtime process (
python3.12,node20, the JVM forjava21) is initialised inside the MicroVM. JVM-based runtimes pay the highest cost here — classloading, JIT warmup, and bytecode verification are inherently expensive. Node.js and Python are much cheaper (tens of milliseconds). - Function init code (your code, outside the handler): Every line of module-level code in your Lambda — SDK client construction, database connection setup, environment variable reads, configuration parsing — executes in sequence before the handler is callable. This is the part you control completely and where the biggest wins live.
- Handler invocation: The actual handler runs for the first time. For the purposes of cold-start measurement, this is the finish line. The sum of steps 1–3 is the observable cold-start latency from the perspective of an upstream caller.
Measuring Cold Starts: The Right Metrics
AWS Lambda publishes an Init Duration field in CloudWatch Logs when a cold start occurs. This field is absent on warm invocations, which makes it straightforward to filter and measure. The metrics you should be tracking in production:
Init Durationp50/p95/p99: Extract via CloudWatch Logs Insights. The p99 is your worst-case caller experience and is the figure that matters for SLO compliance.- Cold-start rate:
cold_starts / total_invocationsover a sliding window. At sustained high traffic this approaches 0 %; at low-traffic or bursty patterns it can exceed 10 %. - Concurrency metrics:
ConcurrentExecutionsandUnreservedConcurrentExecutionsfrom CloudWatch Metrics. A sudden spike in concurrent executions directly predicts a cold-start burst.
Init Code Optimisation: The Highest-Leverage Work
Your init code (module-level initialisation outside the handler) runs once per cold start. Any latency you remove from init code is removed from every cold start permanently. This is where experienced engineers spend their time before reaching for Provisioned Concurrency. Common patterns:
- Lazy-initialise optional clients: If a Secrets Manager client, a DynamoDB table reference, or an SQS queue URL is only needed in certain code paths, move it inside the handler or behind a module-level singleton that initialises on first call. Do not pay for it on every cold start if only 5 % of invocations use it.
- Resolve secrets once and cache them at module level: A Secrets Manager
GetSecretValuecall is ~50 ms. If you call it inside the handler body, you pay 50 ms on every invocation. If you call it in init code, you pay 50 ms once per cold start. But take care: cache the resolved value in a module-level variable; the execution environment persists between warm invocations (this is the intended pattern). - Import only what you need: In Python and Node.js, importing entire SDKs when you only need one client loads significant amounts of code. Use path imports:
from boto3 import client as boto_clientinstead ofimport boto3, orimport { DynamoDBClient } from "@aws-sdk/client-dynamodb"instead of the entire V2 SDK barrel. In Node.js Lambda with the AWS SDK V3, this is especially impactful — V3 is modular specifically to reduce cold starts. - Avoid synchronous file I/O at init: Reading large configuration files, parsing JSON schemas, or compiling regex patterns at module level is common and expensive. Profile with
AWS_LAMBDA_LOG_LEVEL=TRACEor a simpleDate.now()diff around each init block to see where time is spent.
Provisioned Concurrency: Eliminating Cold Starts on Critical Paths
Provisioned Concurrency (PC) pre-initialises a specified number of execution environments, runs all init code, and keeps those environments ready to accept requests. From a caller's perspective, a PC invocation has zero init latency — it is indistinguishable from a warm invocation. You are paying for idle compute; the cost equation is therefore: cost of PC × reserved count × time vs. cost of cold starts × cold start rate × p99 init duration × SLO impact.
Where PC makes economic and operational sense:
- APIs with strict p99 latency SLOs (payment flows, auth endpoints, real-time features)
- JVM-based Lambdas (Java, Kotlin, Scala) where cold starts routinely exceed 1–3 seconds
- Functions that are invoked at highly variable rates — after a period of zero traffic, the first burst of requests all cold-start simultaneously without PC
- Scheduled jobs with a tight deadline — a Step Function task with a 10-second timeout and a 5-second cold start leaves no margin for the actual work
SnapStart: Snapshot-Based Cold Start Mitigation for the JVM
AWS Lambda SnapStart (available for Java 21+ managed runtime) takes a snapshot of the fully-initialised execution environment after the init phase and stores it as a Firecracker memory snapshot. On a cold start, Lambda restores from this snapshot rather than reinitialising from scratch. In practice, this compresses JVM cold starts from 3–8 seconds down to 200–600 ms — without any code changes and without the per-instance cost of Provisioned Concurrency.
Enabling SnapStart is a single Terraform or console setting:
SnapStart has two correctness caveats that every Java engineer must understand before enabling it in production:
- Uniqueness hooks: Any state that must be unique per execution environment — random seeds, UUIDs generated at init time, TLS session keys — will be identical across all restored instances if generated before the snapshot. Lambda provides the
CRaC(Coordinated Restore at Checkpoint) API hooks: implementorg.crac.Resourceand register withCore.getGlobalContext(). In thebeforeCheckpointhook, close network connections and release any unique state. In theafterRestorehook, re-establish connections and regenerate unique state. - Network connections in init code: A TCP connection to RDS, ElastiCache, or an external API opened at init time will be stale after snapshot restore. Either open connections lazily in the handler body, or use the
afterRestoreCRaC hook to reconnect.
Memory, Timeout, and Architecture: The Other Knobs
Cold starts correlate with function memory configuration. Lambda CPU allocation is proportional to memory: a 128 MB function gets a fraction of a vCPU; a 1769 MB function gets exactly 1 vCPU; a 3008 MB function gets close to 2. For JVM functions in particular, more memory means faster classloading, faster JIT compilation, and therefore shorter cold starts. The sweet spot for Java cold-start reduction without burning budget is typically 1024–2048 MB.
For ARM64 (Graviton2) functions, cold starts are measurably shorter than x86_64 for equivalent memory settings, and the per-invocation compute cost is 20 % lower. Unless you have a specific reason to stay on x86_64 (native extensions, architecture-specific libraries), new functions should default to architectures = ["arm64"] in Terraform.
Finally, function package size directly affects download time during cold start. A 50 MB ZIP has a shorter download window than a 250 MB ZIP. Keep dependencies minimal; use Lambda Layers for shared libraries so the layer is cached at the availability zone level rather than downloaded per function version.