TLS & Certificates
TLS & Certificates
Transport Layer Security (TLS) is the protocol that puts the padlock on every HTTPS URL. It sits between TCP (which you know from lesson 1) and the application layer, providing three guarantees simultaneously: confidentiality (no one can read the traffic), integrity (no one can tamper with it undetected), and authentication (you are actually talking to the server you think you are). Understanding TLS deeply is non-negotiable for any DevOps engineer — you will debug certificate errors, rotate expiring certs, and tune handshake performance regularly.
The TLS Handshake, Step by Step
The handshake is the negotiation that happens before a single byte of application data flows. Modern deployments almost exclusively use TLS 1.3, which is both faster and more secure than its predecessors. Here is exactly what happens:
In TLS 1.3 the client sends its key share speculatively in the very first message. The server can therefore compute the shared secret immediately and send its certificate and a Finished message all in one flight. The client verifies the certificate, sends its own Finished, and encrypted data flows. That is one round-trip — half the latency of TLS 1.2.
Certificate Chains and the Chain of Trust
A certificate on its own proves nothing unless a browser or OS already trusts the entity that signed it. The trust model works as a chain:
- Root CA — self-signed, embedded in operating systems and browsers. Examples: ISRG Root X1 (Let's Encrypt's root), DigiCert Global Root CA.
- Intermediate CA — signed by the root, used for day-to-day issuance. Root CAs are kept offline; intermediates do the actual signing.
- End-entity (leaf) certificate — your server's certificate, signed by the intermediate.
When your server sends its certificate, it must also send the intermediate(s). Browsers will not fetch missing intermediates — they just fail. A classic production incident is deploying a cert without bundling the intermediate chain. Verify with:
SAN Certificates (Subject Alternative Names)
The CN (Common Name) field of a certificate is legacy and browsers no longer use it for hostname validation — they only check the Subject Alternative Name (SAN) extension. A single cert can protect many hostnames via multiple SANs:
DNS:example.comDNS:www.example.comDNS:api.example.comDNS:*.staging.example.com(wildcard — one level only)
Wildcard SANs (*.example.com) cover exactly one subdomain level and do not match the apex (example.com) or deeper subdomains (a.b.example.com). For microservices with many unique hostnames, a single multi-SAN cert is far simpler to manage than dozens of individual certs.
Let's Encrypt and Automated Certificate Management (ACME)
Let's Encrypt issues free, 90-day DV (domain-validated) certificates via the ACME protocol. The short lifetime is intentional — it forces automation and limits damage from key compromise. In production, you never manually download a cert; you run an ACME client that handles issuance and renewal automatically.
The two common validation methods:
- HTTP-01 — ACME places a token at
http://yourdomain/.well-known/acme-challenge/<token>. Works for any port-80-accessible host. Cannot be used for wildcards. - DNS-01 — ACME creates a
_acme-challenge.yourdomainTXT record. Works for wildcards, internal hosts, and hosts without port-80 exposure. Requires DNS API access.
Terminating TLS in Production
TLS is typically terminated at the edge — a load balancer, API gateway, or reverse proxy — so backend services communicate over plain HTTP on a private network. This is called TLS offloading. For services that require end-to-end encryption (payment processors, healthcare), mTLS (mutual TLS) is used: both the client and server present certificates, giving strong identity on both ends. Service meshes (Istio, Linkerd) automate mTLS between every pod in a cluster.
Diagnosing Certificate Failures
The most common certificate errors in production and how to triage them:
- CERTIFICATE_VERIFY_FAILED / ERR_CERT_AUTHORITY_INVALID — The server did not send the full chain. Check with
openssl s_client -showcerts. - ERR_CERT_DATE_INVALID / certificate has expired — Renewal automation broke. Check
certbot certificates, cron logs, and whether the reload hook ran. - ERR_CERT_COMMON_NAME_INVALID / hostname mismatch — The hostname you're connecting to is not in the cert's SANs. Inspect with
openssl x509 -noout -ext subjectAltName. - SSL_ERROR_RX_RECORD_TOO_LONG — The server is responding with plain HTTP on a port the client expected TLS. A classic misconfiguration: traffic hitting port 80 instead of 443.
curl -k, Python's verify=False, or PYTHONHTTPSVERIFY=0 are acceptable only in isolated local debugging. In staging or CI pipelines they mask real certificate problems that will break production. If your internal PKI is not trusted, install the CA cert into the system trust store — do not disable verification.
TLS is the foundation of every secure service you will run. Get comfortable with openssl s_client as your first-response tool; it tells you more about a TLS connection in five seconds than most GUI tools will in five minutes.