Linux System Administration

SSH in Depth

18 min Lesson 7 of 28

SSH in Depth

SSH (Secure Shell) is the nerve system of every production infrastructure. You use it to log into servers, copy files, run remote commands, forward ports, and chain access through jump hosts. At big-tech companies, SSH configuration and bastion patterns are codified into policy — misconfigurations expose blast radius that is measured in entire data-centres. This lesson covers the subject as a senior SRE would: keys, hardening sshd, the client config file, the agent, tunnels, and the bastion pattern that protects fleets from direct internet exposure.

Key-Based Authentication: The Foundation

Password authentication over SSH is banned in virtually every security-conscious environment. Keys are stronger and enable automation. The pair consists of a private key (stays on your workstation, never shared, should be passphrase-protected) and a public key (placed on every server you need to reach).

# Generate an Ed25519 key (preferred over RSA for new keys — smaller, faster, equally secure)
ssh-keygen -t ed25519 -C "you@company.com" -f ~/.ssh/id_ed25519

# RSA is still acceptable when you need broad compatibility (e.g., older FIPS environments)
ssh-keygen -t rsa -b 4096 -C "you@company.com" -f ~/.ssh/id_rsa_work

# Copy your public key to a server
ssh-copy-id -i ~/.ssh/id_ed25519.pub user@192.168.1.50

# Or manually append — useful when ssh-copy-id is unavailable
cat ~/.ssh/id_ed25519.pub | ssh user@192.168.1.50 \
  "mkdir -p ~/.ssh && chmod 700 ~/.ssh && cat >> ~/.ssh/authorized_keys && chmod 600 ~/.ssh/authorized_keys"

Key idea: The authorized_keys file on the server must have permissions 600 (owner read/write only) and the ~/.ssh directory must be 700. Loose permissions cause sshd to silently ignore the file — one of the most common "why won't my key work?" failures in production.

Hardening sshd: The Server Daemon

The default sshd_config ships with sensible defaults in modern distros but several settings must be tightened before a server is exposed to the internet. Edit /etc/ssh/sshd_config (or better, drop a file into /etc/ssh/sshd_config.d/99-hardening.conf for clean separation from the vendor defaults).

# /etc/ssh/sshd_config.d/99-hardening.conf

# Disable password auth entirely — keys only
PasswordAuthentication no
PermitEmptyPasswords no
ChallengeResponseAuthentication no

# Disable root login — use a service account, escalate via sudo
PermitRootLogin no

# Restrict to specific users or groups (deny all others implicitly)
AllowGroups ssh-users sudo

# Use only strong ciphers/MACs/KEX (aligned with Mozilla Modern profile)
Ciphers chacha20-poly1305@openssh.com,aes256-gcm@openssh.com,aes128-gcm@openssh.com
MACs hmac-sha2-256-etm@openssh.com,hmac-sha2-512-etm@openssh.com
KexAlgorithms curve25519-sha256,curve25519-sha256@libssh.org

# Reduce attack surface
X11Forwarding no
AllowAgentForwarding no          # Allow selectively per Match block if needed
AllowTcpForwarding no            # Same
PrintMotd no

# Disconnect idle sessions after 10 minutes of inactivity
ClientAliveInterval 120
ClientAliveCountMax 5

# Prevent slow brute-force with early disconnect on unauthenticated connections
LoginGraceTime 30
MaxAuthTries 3
MaxStartups 10:30:60             # Rate-limit unauthenticated connections

# Log verbosity for audit trails
LogLevel VERBOSE                 # Logs key fingerprints (critical for forensics)

After editing, always test the config before reloading to avoid locking yourself out:

# Dry-run: parse config and report errors without reloading
sshd -t

# Apply
systemctl reload sshd     # Graceful reload — keeps existing sessions alive
# OR on Ubuntu/Debian:
systemctl reload ssh

# Confirm the daemon is running and absorbed the change
systemctl status sshd
ss -tlnp | grep :22

Production pitfall — locking yourself out: Always keep an existing SSH session open while reloading sshd on a remote machine. The reload only affects new connections; your current session survives. If the new config is broken, you still have a rescue path. On cloud VMs, also verify that the vendor console (AWS Session Manager, GCP Cloud Shell, Azure Serial Console) is available as a fallback before hardening an instance.

The SSH Client Config File

Typing long ssh -i ~/.ssh/id_ed25519 -p 2222 -J bastion.company.com user@10.0.1.55 commands is error-prone and impossible to automate consistently. The ~/.ssh/config file encodes all of this once and lets you type ssh prod-api-01.

# ~/.ssh/config

# Global defaults — apply to every host unless overridden below
Host *
    ServerAliveInterval 60
    ServerAliveCountMax 3
    AddKeysToAgent yes
    IdentityFile ~/.ssh/id_ed25519
    HashKnownHosts yes
    StrictHostKeyChecking ask

# Bastion host (internet-facing jump server)
Host bastion
    HostName bastion.company.com
    User deploy
    Port 22
    IdentityFile ~/.ssh/id_ed25519_work
    ForwardAgent yes              # Forward agent only to trusted bastion

# Production app servers — reached via bastion
Host prod-*
    User ec2-user
    ProxyJump bastion             # Automatically tunnel through bastion
    StrictHostKeyChecking yes
    IdentityFile ~/.ssh/id_ed25519_work

# Override for a specific production host
Host prod-api-01
    HostName 10.0.1.55

Host prod-db-01
    HostName 10.0.2.10
    # No agent forwarding — database nodes are highest-sensitivity
    ForwardAgent no

With this config, ssh prod-api-01 automatically proxies through the bastion, uses the right key, and enforces host-key verification — one command, zero manual flags.

Pro practice: Use HashKnownHosts yes globally. It stores known hosts as SHA-1 hashes rather than plain hostnames, preventing an attacker who reads your ~/.ssh/known_hosts from mapping your internal network topology.

The SSH Agent: Unlocking Keys Once

A passphrase-protected key is secure at rest but painful if you re-type the passphrase for every connection. The SSH agent holds decrypted keys in memory for the lifetime of your session. You unlock once; the agent signs challenges on your behalf.

# Start the agent and add your key (macOS Keychain integrates automatically via AddKeysToAgent)
eval "$(ssh-agent -s)"
ssh-add ~/.ssh/id_ed25519          # Prompts for passphrase once
ssh-add -l                         # List loaded keys and their fingerprints
ssh-add -t 3600 ~/.ssh/id_ed25519  # Load with 1-hour expiry (production best practice)

# Remove a key from the agent (e.g., after using a high-privilege key)
ssh-add -d ~/.ssh/id_ed25519

# Remove all keys
ssh-add -D

Agent forwarding (ForwardAgent yes or -A flag) lets you SSH from the bastion onward to internal hosts using keys from your local agent — the private key never leaves your laptop. This is the secure alternative to placing private keys on the bastion.

Production pitfall — agent forwarding scope: Only enable ForwardAgent on the bastion host (a machine you trust and control). Never enable it for arbitrary hosts: anyone with root on a machine you connect to with a forwarded agent can hijack your agent socket and impersonate you to any other server your key has access to. This is a well-known lateral movement vector.

SSH Tunnels

SSH tunnels let you securely forward TCP traffic through the encrypted SSH channel. There are three modes:

Local port forwarding (-L) — binds a port on your local machine and forwards traffic to a remote destination via the SSH server. Classic use: reach a database that listens only on localhost of a remote server.
Remote port forwarding (-R) — binds a port on the remote SSH server and forwards inbound traffic to your local machine. Useful for exposing a local dev service temporarily to a remote host.
Dynamic / SOCKS proxy (-D) — turns the SSH client into a SOCKS5 proxy, routing arbitrary TCP traffic through the server. Used for browsing internal networks.

# Local forwarding: access a remote MySQL (port 3306) as if it were local port 3307
ssh -L 3307:127.0.0.1:3306 user@db-host -N
# Connect your local mysql client: mysql -h 127.0.0.1 -P 3307 -u app -p

# Via a bastion: forward to a host the bastion can reach but you cannot
ssh -L 3307:db.internal:3306 user@bastion -N

# Remote forwarding: expose local port 8080 on the server's port 9090
ssh -R 9090:127.0.0.1:8080 user@server -N

# Dynamic SOCKS5 proxy on local port 1080 — route browser traffic through the server
ssh -D 1080 user@server -N -f   # -N: no command; -f: background

The Bastion Pattern

A bastion host (also called a jump server) is the single authorised entry point for SSH access to a private network. All other servers have no public IP and accept SSH connections only from the bastion's internal IP. This dramatically reduces attack surface: only one machine is internet-facing, and it is locked down, monitored, and audited.

Bastion pattern — engineers reach private servers only via the bastion; security groups block all direct internet SSH.

Modern cloud environments implement this pattern using either a traditional bastion EC2 instance or managed alternatives:

AWS Systems Manager Session Manager — no bastion at all; agent on each instance; access via IAM policy, fully audited in CloudTrail.
GCP Identity-Aware Proxy (IAP) tunneling — similar model; no public SSH port required.
Tailscale / Cloudflare Access — overlay network or zero-trust proxy; no VPN required.

For environments that do run a traditional bastion, harden it further: run it on an immutable AMI (no persistent state), rotate the host key on every launch, ship every session to a centralised audit log (e.g., tlog), and restrict ingress in the security group to corporate egress IPs only.

Certificate-Based SSH: Beyond Authorized Keys

At scale, managing authorized_keys across hundreds of servers becomes a maintenance nightmare: a departing employee's key must be removed from every machine individually. Large organisations use SSH certificates instead — a short-lived signed credential issued by an internal Certificate Authority (CA). The CA public key is placed on every server once; any certificate signed by that CA is trusted for its validity period (often 8-24 hours). When an employee leaves, you simply stop issuing certificates — existing sessions expire naturally.

Big-tech practice: Companies like Google and Netflix pioneered certificate-based SSH at scale (Netflix open-sourced BLESS, a Lambda-based SSH CA). HashiCorp Vault has a built-in SSH secrets engine that issues signed certificates on demand after authenticating via LDAP or OIDC — the gold standard for a self-hosted solution. Even if you do not implement certificates today, understand the model because you will encounter it in any sufficiently mature infrastructure.

Common Failure Modes

Permission errors on authorized_keys — sshd refuses to use the file if it is world-writable or the home directory is group-writable. Fix: chmod 700 ~/.ssh; chmod 600 ~/.ssh/authorized_keys.
SELinux/AppArmor blocking sshd — on RHEL/CentOS, if you move SSH to a non-standard port you must run semanage port -a -t ssh_port_t -p tcp 2222 or SELinux will block the new port even after updating sshd_config.
Known host mismatch — after rebuilding a server with the same IP, the host key changes. SSH refuses to connect with a scary warning. Fix: ssh-keygen -R <hostname-or-ip>. In automation, use StrictHostKeyChecking accept-new for the first-time connection from configuration management, then lock it to yes.
Agent forwarding socket permission — if SSH_AUTH_SOCK is unset or the socket file disappeared, forwarding silently fails. Check with ssh-add -l on the remote host.