Shell Scripting & Automation

Pipes, Redirection & Streams

18 min Lesson 6 of 28

Pipes, Redirection & Streams

Every Unix process lives inside a world of three open file descriptors the moment it starts: standard input (stdin, fd 0), standard output (stdout, fd 1), and standard error (stderr, fd 2). Understanding how to wire these streams together — and how to redirect them to files, devices, or other processes — is the single most powerful skill in shell scripting. At big-tech scale, pipelines process terabytes of log data nightly; a missing 2>&1 in a cron job has silently swallowed critical error messages for years. This lesson makes you fluent in streams.

The Three Standard Streams

When a process writes a result, it goes to stdout. When it writes a warning or diagnostic, it goes to stderr. When it needs to read data, it reads from stdin. The shell lets you attach any of these to a file, a device, another command, or /dev/null.

Every process inherits three open file descriptors at startup: stdin, stdout, and stderr.

Output Redirection

The > operator redirects stdout to a file, truncating it first. The >> operator appends. These are the building blocks of every log-writing script.

# Overwrite (truncate) the file each run
echo "Deployment started at $(date)" > /var/log/deploy.log

# Append — safe for log accumulation across runs
echo "Step 1 complete" >> /var/log/deploy.log

# Redirect only stderr (fd 2) to a separate file — stdout still goes to the terminal
make build 2> /var/log/build-errors.log

# Redirect both stdout and stderr to the same file (most common in cron jobs)
./backup.sh > /var/log/backup.log 2>&1

# Modern bash shorthand (bash 4+) — identical meaning, preferred in new scripts
./backup.sh >& /var/log/backup.log

Order matters with 2>&1. Write it after the stdout redirect: cmd > file 2>&1. If you write cmd 2>&1 > file, stderr is duplicated to the original stdout (the terminal) before stdout is redirected to the file — so errors still appear on screen. This is a classic, career-embarrassing mistake in cron scripts.

Discarding output entirely uses the null device:

# Suppress stdout only (silently discard progress messages)
./noisy-tool.sh > /dev/null

# Suppress ALL output — useful when only the exit code matters
./health-check.sh >& /dev/null && echo "healthy" || echo "FAIL"

Input Redirection

The < operator feeds a file into a command's stdin. A here-document (<<EOF) embeds multi-line input directly in the script without a temporary file. A here-string (<<<) passes a single string as stdin.

# Feed a SQL file directly to the mysql client
mysql -u root -p mydb < schema.sql

# Here-document: send multi-line text to stdin
# The delimiter (EOF) must be alone on the closing line, no leading spaces
sendmail ops@example.com <<EOF
Subject: Deploy complete
Build #42 deployed to production at $(date).
EOF

# Here-string: single-line stdin — avoids echo | cmd anti-pattern
grep "ERROR" <<< "$(cat /var/log/app.log)"

# Cleaner: use grep directly, but here-string is useful for variable content
base64 --decode <<< "SGVsbG8gV29ybGQ="

Pipes: Connecting Commands

A pipe (|) connects the stdout of one command directly to the stdin of the next — in memory, without a temporary file. The kernel creates an anonymous pipe buffer; both processes run concurrently. This is not sequential execution: producer | consumer means the consumer starts immediately and processes data as it arrives.

# Classic pipeline: find the ten most frequent IPs in an nginx access log
cat /var/log/nginx/access.log \
  | awk '{print $1}' \
  | sort \
  | uniq -c \
  | sort -rn \
  | head -10

# Count ERROR lines in journald output for the last hour
journalctl --since "1 hour ago" --no-pager \
  | grep -c "ERROR"

# Real-time monitoring: tail a log and filter for critical events
tail -F /var/log/app/production.log \
  | grep --line-buffered "CRITICAL\|FATAL" \
  | while read -r line; do
      echo "$line"
      # Could also send a Slack alert here
    done

Pipeline exit codes: by default, a pipeline's exit code is the exit code of the last command. If grep at the end returns 0 but awk in the middle failed, your script will not notice. Enable set -o pipefail (covered in Lesson 8) so a pipeline fails if any stage fails — this is mandatory in production scripts.

tee: Splitting a Stream

tee reads from stdin and writes to both stdout and one or more files simultaneously. It is named after the T-junction in plumbing. Use it when you need to log output while still passing it downstream in a pipeline.

# Log the output of a build while also showing it in the terminal
make build 2>&1 | tee /var/log/build.log

# Append mode — tee -a keeps previous log entries
./run-tests.sh 2>&1 | tee -a /var/log/test-runs.log

# Real production pattern: run a script, log everything, and parse the log
./deploy.sh 2>&1 \
  | tee /var/log/deploy-$(date +%Y%m%d-%H%M%S).log \
  | grep -E "ERROR|WARN" \
  | mail -s "Deploy alerts" ops@example.com

Process Substitution

Process substitution lets you treat the output of a command as if it were a file. The syntax <(cmd) creates a named pipe (or /dev/fd/N) that another command can open and read. This is essential when a command requires a filename argument and does not read from stdin.

Process substitution wires two command outputs into diff as virtual file descriptors — no temp files needed.

# Compare two sorted lists without creating temporary files
diff <(sort prod-servers.txt) <(sort staging-servers.txt)

# Compare live package list against a known-good baseline
diff <(dpkg -l | awk '{print $2}' | sort) <(sort /etc/expected-packages.txt)

# Output redirection variant: >(cmd) — write into a command as if it were a file
# Log to two destinations simultaneously (alternative to tee)
./run-migration.sh > >(tee /var/log/migration.log) 2> >(tee /var/log/migration-errors.log >&2)

# Join two command outputs side-by-side with paste
paste <(cut -d: -f1 /etc/passwd | sort) <(cut -d: -f3 /etc/passwd | sort -n)

Process substitution vs. pipes: use a pipe when the consumer reads from stdin. Use process substitution when the consumer expects a filename argument — for example, diff, comm, join, or any tool that calls open(2) on its arguments. Mixing both techniques covers virtually every real-world data-wiring need.

Putting It Together: A Real Log Analysis Pipeline

Here is a production-realistic script that demonstrates every stream concept in this lesson. It is the kind of script you would find in a SRE runbook:

#!/usr/bin/env bash
# analyze-errors.sh — daily error report from nginx logs
# Usage: ./analyze-errors.sh [logfile]

LOG="${1:-/var/log/nginx/access.log}"
REPORT_DIR="/var/reports/errors"
DATE=$(date +%Y-%m-%d)
REPORT="${REPORT_DIR}/${DATE}.txt"

mkdir -p "$REPORT_DIR"

{
  echo "=== Error Report: ${DATE} ==="
  echo ""

  echo "--- Top 10 IPs hitting 4xx/5xx ---"
  # awk filters lines where HTTP status (field 9) starts with 4 or 5
  awk '$9 ~ /^[45]/' "$LOG" \
    | awk '{print $1}' \
    | sort \
    | uniq -c \
    | sort -rn \
    | head -10

  echo ""
  echo "--- Status code distribution ---"
  awk '{print $9}' "$LOG" \
    | grep -E '^[0-9]{3}$' \
    | sort \
    | uniq -c \
    | sort -rn

  echo ""
  echo "--- New error paths not seen yesterday ---"
  diff \
    <(awk '$9 ~ /^[45]/ {print $7}' "${REPORT_DIR}/$(date -d yesterday +%Y-%m-%d).txt" 2>/dev/null | sort -u) \
    <(awk '$9 ~ /^[45]/ {print $7}' "$LOG" | sort -u) \
    | grep '^>' | awk '{print $2}'

} 2>&1 | tee "$REPORT"

echo "Report written to: $REPORT" >&2

Notice how { ... } 2>&1 | tee "$REPORT" wraps an entire block — all stdout and stderr from the block flow into tee, which writes to the report file while also printing to the terminal. The final echo sends to stderr (>&2) so it is not captured in the report itself.

Big-tech practice: at Google and Meta, cron jobs and CI scripts routinely redirect stdout and stderr to timestamped log files, then ship those logs to a centralized system (Splunk, Loki, Cloud Logging). The pipeline discipline you build now — keeping stdout for data, stderr for diagnostics, and capturing both correctly — maps directly onto how production observability pipelines are wired.