We are still cooking the magic in the way!
SSH in Depth
SSH in Depth
SSH (Secure Shell) is the nerve system of every production infrastructure. You use it to log into servers, copy files, run remote commands, forward ports, and chain access through jump hosts. At big-tech companies, SSH configuration and bastion patterns are codified into policy — misconfigurations expose blast radius that is measured in entire data-centres. This lesson covers the subject as a senior SRE would: keys, hardening sshd, the client config file, the agent, tunnels, and the bastion pattern that protects fleets from direct internet exposure.
Key-Based Authentication: The Foundation
Password authentication over SSH is banned in virtually every security-conscious environment. Keys are stronger and enable automation. The pair consists of a private key (stays on your workstation, never shared, should be passphrase-protected) and a public key (placed on every server you need to reach).
authorized_keys file on the server must have permissions 600 (owner read/write only) and the ~/.ssh directory must be 700. Loose permissions cause sshd to silently ignore the file — one of the most common "why won't my key work?" failures in production.
Hardening sshd: The Server Daemon
The default sshd_config ships with sensible defaults in modern distros but several settings must be tightened before a server is exposed to the internet. Edit /etc/ssh/sshd_config (or better, drop a file into /etc/ssh/sshd_config.d/99-hardening.conf for clean separation from the vendor defaults).
After editing, always test the config before reloading to avoid locking yourself out:
sshd on a remote machine. The reload only affects new connections; your current session survives. If the new config is broken, you still have a rescue path. On cloud VMs, also verify that the vendor console (AWS Session Manager, GCP Cloud Shell, Azure Serial Console) is available as a fallback before hardening an instance.
The SSH Client Config File
Typing long ssh -i ~/.ssh/id_ed25519 -p 2222 -J bastion.company.com user@10.0.1.55 commands is error-prone and impossible to automate consistently. The ~/.ssh/config file encodes all of this once and lets you type ssh prod-api-01.
With this config, ssh prod-api-01 automatically proxies through the bastion, uses the right key, and enforces host-key verification — one command, zero manual flags.
HashKnownHosts yes globally. It stores known hosts as SHA-1 hashes rather than plain hostnames, preventing an attacker who reads your ~/.ssh/known_hosts from mapping your internal network topology.
The SSH Agent: Unlocking Keys Once
A passphrase-protected key is secure at rest but painful if you re-type the passphrase for every connection. The SSH agent holds decrypted keys in memory for the lifetime of your session. You unlock once; the agent signs challenges on your behalf.
Agent forwarding (ForwardAgent yes or -A flag) lets you SSH from the bastion onward to internal hosts using keys from your local agent — the private key never leaves your laptop. This is the secure alternative to placing private keys on the bastion.
ForwardAgent on the bastion host (a machine you trust and control). Never enable it for arbitrary hosts: anyone with root on a machine you connect to with a forwarded agent can hijack your agent socket and impersonate you to any other server your key has access to. This is a well-known lateral movement vector.
SSH Tunnels
SSH tunnels let you securely forward TCP traffic through the encrypted SSH channel. There are three modes:
- Local port forwarding (
-L) — binds a port on your local machine and forwards traffic to a remote destination via the SSH server. Classic use: reach a database that listens only on localhost of a remote server. - Remote port forwarding (
-R) — binds a port on the remote SSH server and forwards inbound traffic to your local machine. Useful for exposing a local dev service temporarily to a remote host. - Dynamic / SOCKS proxy (
-D) — turns the SSH client into a SOCKS5 proxy, routing arbitrary TCP traffic through the server. Used for browsing internal networks.
The Bastion Pattern
A bastion host (also called a jump server) is the single authorised entry point for SSH access to a private network. All other servers have no public IP and accept SSH connections only from the bastion's internal IP. This dramatically reduces attack surface: only one machine is internet-facing, and it is locked down, monitored, and audited.
Modern cloud environments implement this pattern using either a traditional bastion EC2 instance or managed alternatives:
- AWS Systems Manager Session Manager — no bastion at all; agent on each instance; access via IAM policy, fully audited in CloudTrail.
- GCP Identity-Aware Proxy (IAP) tunneling — similar model; no public SSH port required.
- Tailscale / Cloudflare Access — overlay network or zero-trust proxy; no VPN required.
For environments that do run a traditional bastion, harden it further: run it on an immutable AMI (no persistent state), rotate the host key on every launch, ship every session to a centralised audit log (e.g., tlog), and restrict ingress in the security group to corporate egress IPs only.
Certificate-Based SSH: Beyond Authorized Keys
At scale, managing authorized_keys across hundreds of servers becomes a maintenance nightmare: a departing employee's key must be removed from every machine individually. Large organisations use SSH certificates instead — a short-lived signed credential issued by an internal Certificate Authority (CA). The CA public key is placed on every server once; any certificate signed by that CA is trusted for its validity period (often 8-24 hours). When an employee leaves, you simply stop issuing certificates — existing sessions expire naturally.
Common Failure Modes
- Permission errors on
authorized_keys—sshdrefuses to use the file if it is world-writable or the home directory is group-writable. Fix:chmod 700 ~/.ssh; chmod 600 ~/.ssh/authorized_keys. - SELinux/AppArmor blocking
sshd— on RHEL/CentOS, if you move SSH to a non-standard port you must runsemanage port -a -t ssh_port_t -p tcp 2222or SELinux will block the new port even after updatingsshd_config. - Known host mismatch — after rebuilding a server with the same IP, the host key changes. SSH refuses to connect with a scary warning. Fix:
ssh-keygen -R <hostname-or-ip>. In automation, useStrictHostKeyChecking accept-newfor the first-time connection from configuration management, then lock it toyes. - Agent forwarding socket permission — if
SSH_AUTH_SOCKis unset or the socket file disappeared, forwarding silently fails. Check withssh-add -lon the remote host.