Linux System Administration

Logs & journald

18 min Lesson 2 of 28

Logs & journald

When a production system behaves unexpectedly — a service crashes, a request times out, a deployment silently fails — the first question is always: what does the log say? Logs are the primary diagnostic surface for every Linux system, and mastering them is non-negotiable for DevOps work. This lesson covers the two complementary logging systems you will encounter on every modern Linux host: systemd's journald and the traditional syslog stack, plus the canonical log file locations you need to know by heart.

How Linux Logging Works: Two Layers

On a modern system running systemd, logs flow through two parallel channels:

journald (/usr/lib/systemd/systemd-journald) — the systemd journal daemon. It captures everything that passes through the kernel ring buffer (dmesg), structured log entries from systemd units, and any output written to stdout/stderr by managed services. It stores logs in a binary, indexed format under /run/log/journal/ (volatile) or /var/log/journal/ (persistent).
rsyslog / syslog-ng — a traditional text-based syslog daemon that also runs in parallel on most distributions. journald can forward messages to it via a socket, producing the familiar plain-text files under /var/log/.

Key Insight: On RHEL/CentOS/Fedora, journald persistence is enabled by default (logs survive reboots). On Debian/Ubuntu, the default is volatile — logs live in RAM and are lost on reboot — unless you create the persistence directory manually or set Storage=persistent in /etc/systemd/journald.conf.

Enabling Persistent Journal Storage

The first thing to verify on any new server is whether the journal persists across reboots. Check by looking for /var/log/journal/:

# Check if persistent storage is active
ls /var/log/journal/

# If the directory is missing, create it and restart journald
mkdir -p /var/log/journal
systemd-tmpfiles --create --prefix /var/log/journal
systemctl restart systemd-journald

# Alternatively, set it explicitly in config
# Edit /etc/systemd/journald.conf and set:
# Storage=persistent
# Then: systemctl restart systemd-journald

journalctl: The Right Tool for Every Log Query

journalctl is the command-line interface to journald. At big-tech scale you will query it hundreds of times per week — learn its filters deeply.

# --- Basic queries ---
journalctl                        # All logs, oldest first (paged with less)
journalctl -e                     # Jump to end (most recent)
journalctl -f                     # Follow new entries in real time (like tail -f)
journalctl -n 100                 # Last 100 lines

# --- Filter by unit (service) ---
journalctl -u nginx               # All logs for the nginx unit
journalctl -u nginx -f            # Follow nginx logs
journalctl -u nginx -u postgresql # Multiple units at once

# --- Filter by time ---
journalctl --since "2025-06-10 08:00:00"
journalctl --since "1 hour ago"
journalctl --since yesterday --until "06:00"
journalctl -u sshd --since "2025-06-10" --until "2025-06-11"

# --- Filter by priority (syslog levels 0-7) ---
journalctl -p err                 # Errors and above (err, crit, alert, emerg)
journalctl -p warning -u kubelet  # Warnings+ for kubelet

# --- Filter by field (structured logging) ---
journalctl _PID=1234              # Logs from a specific PID
journalctl _UID=0                 # Logs from root processes
journalctl _SYSTEMD_UNIT=sshd.service PRIORITY=4   # AND conditions

# --- Output formats ---
journalctl -u nginx -o json-pretty   # Structured JSON output
journalctl -u nginx -o cat           # Raw message text only (great for grep)
journalctl --no-pager | grep FAILED  # Pipe-friendly (disables pager)

# --- Disk usage ---
journalctl --disk-usage
journalctl --vacuum-size=500M     # Trim journal to 500 MB
journalctl --vacuum-time=30d      # Remove entries older than 30 days

Pro Practice — JSON output for automation: When you need to parse logs in a script or pipe them to a log aggregator, use -o json. journald stores rich metadata (PID, UID, command name, boot ID, monotonic timestamp) in every entry — fields that plain syslog text loses. Tools like Filebeat and Fluentd have native journald input plugins that exploit this structured data.

Visualising the Log Flow

Log flow on a modern Linux host: all sources funnel through journald, which forwards to rsyslog (text files) and serves structured data to journalctl and log shippers.

Traditional Syslog: Key Log File Locations

Even in a world dominated by journald, plain-text log files remain essential — they are readable without tooling, trivially greppable, and consumed by countless legacy agents. Know these paths on every distribution:

File / Path	What it contains
`/var/log/syslog` (Debian/Ubuntu)	General system messages; the catch-all log
`/var/log/messages` (RHEL/CentOS)	Same as syslog on Debian family
`/var/log/auth.log` (Debian/Ubuntu)	Authentication events: SSH logins, sudo, PAM
`/var/log/secure` (RHEL/CentOS)	Same as auth.log on Debian family
`/var/log/kern.log`	Kernel messages (OOM killer, hardware errors, network driver)
`/var/log/dmesg`	Boot-time kernel ring buffer snapshot
`/var/log/dpkg.log` / `yum.log`	Package install/remove history
`/var/log/nginx/`, `/var/log/apache2/`	Web server access and error logs (app-specific)
`/var/log/audit/audit.log`	Linux Audit Framework events (SELinux, syscall auditing)

Reading Logs Effectively: Patterns and Techniques

Raw log volume on a busy server can reach millions of lines per day. Effective log reading is about narrowing the search space fast:

# Follow the syslog and highlight errors in real time
tail -f /var/log/syslog | grep --color -i "error\|fail\|crit"

# Find all SSH authentication failures in the last boot
journalctl -u sshd -b -p warning

# Count failed sudo attempts today
grep "authentication failure" /var/log/auth.log | grep "$(date +%b\ %e)" | wc -l

# Check for OOM kills (out-of-memory killer invocations)
journalctl -k | grep -i "oom\|killed process"
# Or: dmesg -T | grep -i "oom\|out of memory"

# Inspect what happened around a specific timestamp (±5 minutes)
journalctl --since "2025-06-10 14:25:00" --until "2025-06-10 14:35:00" -p info

# Show logs from the PREVIOUS boot (when debugging a crash/reboot)
journalctl -b -1                   # Previous boot
journalctl --list-boots            # All available boots with timestamps

# Grep across a compressed rotated log
zgrep "FAILED" /var/log/syslog.2.gz

Log Rotation with logrotate

Plain-text log files grow unbounded without rotation. logrotate is the standard daemon that compresses, renames, and purges old logs on a schedule. It is configured at /etc/logrotate.conf and per-service in /etc/logrotate.d/. A typical configuration for an application log looks like:

# /etc/logrotate.d/myapp
/var/log/myapp/*.log {
    daily
    rotate 14
    compress
    delaycompress
    missingok
    notifempty
    sharedscripts
    postrotate
        systemctl reload myapp > /dev/null 2>&1 || true
    endscript
}

Production Pitfall — journal disk exhaustion: On high-traffic systems, journald can fill a disk surprisingly fast if rate limiting is not configured. By default, journald applies burst limiting (RateLimitIntervalSec=30s, RateLimitBurst=10000) and caps journal size at 10% of the filesystem. Always set explicit bounds in /etc/systemd/journald.conf: use SystemMaxUse=2G and SystemKeepFree=500M so the journal never starves your application of disk space. Run journalctl --disk-usage periodically in your monitoring runbooks.

Forwarding to Centralised Log Aggregation

Individual server logs are useful for quick triage, but at scale — dozens of hosts, microservices, Kubernetes pods — you must ship logs to a centralised platform. The standard production stack is:

Filebeat or Fluentd/Fluent Bit — lightweight agents that tail files or read the journal directly (journald input plugin) and forward to a backend.
Elasticsearch + Kibana (ELK/EFK) — the classic full-text search and visualisation stack. Expensive to operate at scale.
Grafana Loki — the cloud-native, label-based log store. Much cheaper than Elasticsearch because it indexes only metadata labels, not the full text. Well-integrated with Prometheus and Grafana — the preferred choice for new greenfield deployments.

For the purposes of this tutorial series, the key skill is knowing what to look for and where, on the host itself. Aggregation pipelines are covered in the Observability track later in the course.

Pro Practice — always check logs across all three scopes when diagnosing a failure: (1) journalctl -u <service> -n 50 for the service's own output; (2) journalctl -k -b for kernel-level issues (OOM, hardware); (3) /var/log/auth.log or /var/log/secure if the failure involves permissions, PAM, or SSH. Most production bugs leave a trace in at least one of these three.