Shell Scripting & Automation

Pipes, Redirection & Streams

18 min Lesson 6 of 28

Pipes, Redirection & Streams

Every Unix process lives inside a world of three open file descriptors the moment it starts: standard input (stdin, fd 0), standard output (stdout, fd 1), and standard error (stderr, fd 2). Understanding how to wire these streams together — and how to redirect them to files, devices, or other processes — is the single most powerful skill in shell scripting. At big-tech scale, pipelines process terabytes of log data nightly; a missing 2>&1 in a cron job has silently swallowed critical error messages for years. This lesson makes you fluent in streams.

The Three Standard Streams

When a process writes a result, it goes to stdout. When it writes a warning or diagnostic, it goes to stderr. When it needs to read data, it reads from stdin. The shell lets you attach any of these to a file, a device, another command, or /dev/null.

Standard streams: stdin, stdout, stderr flowing through a process stdin (fd 0) keyboard / file / pipe Process (grep / awk / curl…) stdout (fd 1) terminal / file / pipe stderr (fd 2) terminal / log file reads writes results writes errors
Every process inherits three open file descriptors at startup: stdin, stdout, and stderr.

Output Redirection

The > operator redirects stdout to a file, truncating it first. The >> operator appends. These are the building blocks of every log-writing script.

# Overwrite (truncate) the file each run echo "Deployment started at $(date)" > /var/log/deploy.log # Append — safe for log accumulation across runs echo "Step 1 complete" >> /var/log/deploy.log # Redirect only stderr (fd 2) to a separate file — stdout still goes to the terminal make build 2> /var/log/build-errors.log # Redirect both stdout and stderr to the same file (most common in cron jobs) ./backup.sh > /var/log/backup.log 2>&1 # Modern bash shorthand (bash 4+) — identical meaning, preferred in new scripts ./backup.sh >& /var/log/backup.log
Order matters with 2>&1. Write it after the stdout redirect: cmd > file 2>&1. If you write cmd 2>&1 > file, stderr is duplicated to the original stdout (the terminal) before stdout is redirected to the file — so errors still appear on screen. This is a classic, career-embarrassing mistake in cron scripts.

Discarding output entirely uses the null device:

# Suppress stdout only (silently discard progress messages) ./noisy-tool.sh > /dev/null # Suppress ALL output — useful when only the exit code matters ./health-check.sh >& /dev/null && echo "healthy" || echo "FAIL"

Input Redirection

The < operator feeds a file into a command's stdin. A here-document (<<EOF) embeds multi-line input directly in the script without a temporary file. A here-string (<<<) passes a single string as stdin.

# Feed a SQL file directly to the mysql client mysql -u root -p mydb < schema.sql # Here-document: send multi-line text to stdin # The delimiter (EOF) must be alone on the closing line, no leading spaces sendmail ops@example.com <<EOF Subject: Deploy complete Build #42 deployed to production at $(date). EOF # Here-string: single-line stdin — avoids echo | cmd anti-pattern grep "ERROR" <<< "$(cat /var/log/app.log)" # Cleaner: use grep directly, but here-string is useful for variable content base64 --decode <<< "SGVsbG8gV29ybGQ="

Pipes: Connecting Commands

A pipe (|) connects the stdout of one command directly to the stdin of the next — in memory, without a temporary file. The kernel creates an anonymous pipe buffer; both processes run concurrently. This is not sequential execution: producer | consumer means the consumer starts immediately and processes data as it arrives.

# Classic pipeline: find the ten most frequent IPs in an nginx access log cat /var/log/nginx/access.log \ | awk '{print $1}' \ | sort \ | uniq -c \ | sort -rn \ | head -10 # Count ERROR lines in journald output for the last hour journalctl --since "1 hour ago" --no-pager \ | grep -c "ERROR" # Real-time monitoring: tail a log and filter for critical events tail -F /var/log/app/production.log \ | grep --line-buffered "CRITICAL\|FATAL" \ | while read -r line; do echo "$line" # Could also send a Slack alert here done
Pipeline exit codes: by default, a pipeline's exit code is the exit code of the last command. If grep at the end returns 0 but awk in the middle failed, your script will not notice. Enable set -o pipefail (covered in Lesson 8) so a pipeline fails if any stage fails — this is mandatory in production scripts.

tee: Splitting a Stream

tee reads from stdin and writes to both stdout and one or more files simultaneously. It is named after the T-junction in plumbing. Use it when you need to log output while still passing it downstream in a pipeline.

# Log the output of a build while also showing it in the terminal make build 2>&1 | tee /var/log/build.log # Append mode — tee -a keeps previous log entries ./run-tests.sh 2>&1 | tee -a /var/log/test-runs.log # Real production pattern: run a script, log everything, and parse the log ./deploy.sh 2>&1 \ | tee /var/log/deploy-$(date +%Y%m%d-%H%M%S).log \ | grep -E "ERROR|WARN" \ | mail -s "Deploy alerts" ops@example.com

Process Substitution

Process substitution lets you treat the output of a command as if it were a file. The syntax <(cmd) creates a named pipe (or /dev/fd/N) that another command can open and read. This is essential when a command requires a filename argument and does not read from stdin.

Process substitution: two command outputs compared by diff via named pipes cmd A sort prod-servers.txt cmd B sort staging-servers.txt Named Pipe A /dev/fd/63 Named Pipe B /dev/fd/62 diff reads both as files stdout delta lines
Process substitution wires two command outputs into diff as virtual file descriptors — no temp files needed.
# Compare two sorted lists without creating temporary files diff <(sort prod-servers.txt) <(sort staging-servers.txt) # Compare live package list against a known-good baseline diff <(dpkg -l | awk '{print $2}' | sort) <(sort /etc/expected-packages.txt) # Output redirection variant: >(cmd) — write into a command as if it were a file # Log to two destinations simultaneously (alternative to tee) ./run-migration.sh > >(tee /var/log/migration.log) 2> >(tee /var/log/migration-errors.log >&2) # Join two command outputs side-by-side with paste paste <(cut -d: -f1 /etc/passwd | sort) <(cut -d: -f3 /etc/passwd | sort -n)
Process substitution vs. pipes: use a pipe when the consumer reads from stdin. Use process substitution when the consumer expects a filename argument — for example, diff, comm, join, or any tool that calls open(2) on its arguments. Mixing both techniques covers virtually every real-world data-wiring need.

Putting It Together: A Real Log Analysis Pipeline

Here is a production-realistic script that demonstrates every stream concept in this lesson. It is the kind of script you would find in a SRE runbook:

#!/usr/bin/env bash # analyze-errors.sh — daily error report from nginx logs # Usage: ./analyze-errors.sh [logfile] LOG="${1:-/var/log/nginx/access.log}" REPORT_DIR="/var/reports/errors" DATE=$(date +%Y-%m-%d) REPORT="${REPORT_DIR}/${DATE}.txt" mkdir -p "$REPORT_DIR" { echo "=== Error Report: ${DATE} ===" echo "" echo "--- Top 10 IPs hitting 4xx/5xx ---" # awk filters lines where HTTP status (field 9) starts with 4 or 5 awk '$9 ~ /^[45]/' "$LOG" \ | awk '{print $1}' \ | sort \ | uniq -c \ | sort -rn \ | head -10 echo "" echo "--- Status code distribution ---" awk '{print $9}' "$LOG" \ | grep -E '^[0-9]{3}$' \ | sort \ | uniq -c \ | sort -rn echo "" echo "--- New error paths not seen yesterday ---" diff \ <(awk '$9 ~ /^[45]/ {print $7}' "${REPORT_DIR}/$(date -d yesterday +%Y-%m-%d).txt" 2>/dev/null | sort -u) \ <(awk '$9 ~ /^[45]/ {print $7}' "$LOG" | sort -u) \ | grep '^>' | awk '{print $2}' } 2>&1 | tee "$REPORT" echo "Report written to: $REPORT" >&2

Notice how { ... } 2>&1 | tee "$REPORT" wraps an entire block — all stdout and stderr from the block flow into tee, which writes to the report file while also printing to the terminal. The final echo sends to stderr (>&2) so it is not captured in the report itself.

Big-tech practice: at Google and Meta, cron jobs and CI scripts routinely redirect stdout and stderr to timestamped log files, then ship those logs to a centralized system (Splunk, Loki, Cloud Logging). The pipeline discipline you build now — keeping stdout for data, stderr for diagnostics, and capturing both correctly — maps directly onto how production observability pipelines are wired.