Shell Scripting & Automation

Conditionals & Tests

18 min Lesson 3 of 28

Conditionals & Tests

Every real script makes decisions. A backup script must abort when the target disk is full. A deploy script must refuse to run on the wrong branch. A health-check script must page on-call when a service returns a non-200 status. All of these hinge on one construct: the conditional. In this lesson you will master Bash conditionals deeply enough to write production-grade decision logic.

How Bash Evaluates Conditions

Bash conditionals are built on exit codes, not boolean values. Every command returns an integer between 0 and 255. Exit code 0 means success (true); anything else means failure (false). The if keyword simply runs a command and branches on its exit code.

# The simplest possible if: branch on any command's exit code
if ping -c1 -W1 8.8.8.8 >/dev/null 2>&1; then
    echo "Network is up"
else
    echo "Network is down"
fi

# Check the last command's exit code manually
grep -q "ERROR" /var/log/app.log
if [ $? -ne 0 ]; then
    echo "No errors found"
fi

# Idiomatic form — test inline, no $? needed
if grep -q "ERROR" /var/log/app.log; then
    echo "Errors detected — alerting on-call"
fi

The golden rule: Always test exit codes, not output. Parsing command output is fragile; testing whether a command succeeded is robust. When you write if command; then you are testing the command\'s exit code directly — no subshell, no variable, no quoting hazard.

[ ] versus [[ ]] — Know the Difference

[ ] is a POSIX command (/usr/bin/[). [[ ]] is a Bash keyword. In a Bash script, always use [[ ]]. It is not available in /bin/sh, but for scripts with #!/usr/bin/env bash in the shebang it is the right tool for every job.

The practical differences matter in production:

Word splitting: [ $var = "yes" ] explodes if $var contains spaces or is empty. [[ $var == "yes" ]] never word-splits, even without quotes.
Pattern matching: [[ $branch == release/* ]] matches glob patterns natively — no quoting needed on the right-hand side.
Regex matching: [[ $line =~ ^[0-9]+$ ]] supports full POSIX ERE. The regex must NOT be quoted.
Logical operators: [[ $a == "x" && $b == "y" ]] uses && and || inside the brackets. With [ ] you must use -a and -o, which are deprecated and ambiguous.

#!/usr/bin/env bash
set -euo pipefail

branch="release/2.4.1"
version="42"
filename="report 2024.csv"

# --- String tests with [[  ]] ---

# Equality and inequality
[[ "$branch" == "main" ]]        && echo "on main"
[[ "$branch" != "main" ]]        && echo "not on main"

# Glob matching — no quotes around the pattern
[[ "$branch" == release/* ]]     && echo "release branch detected"

# Regex matching — regex MUST NOT be quoted
[[ "$version" =~ ^[0-9]+$ ]]     && echo "version is numeric"

# Empty / non-empty string
[[ -z "$branch" ]]               && echo "branch is empty"
[[ -n "$branch" ]]               && echo "branch is set"

# Safe even with spaces — [[  ]] never word-splits
[[ "$filename" == *.csv ]]       && echo "CSV file detected"

# --- Logical combinators ---
[[ "$branch" == release/* && -n "$version" ]] && echo "release with version"

File Tests

DevOps scripts constantly interrogate the filesystem — checking whether a config file exists before reading it, whether a directory is writable before deploying into it, whether a socket is present before sending traffic. Bash provides a full suite of file-test operators:

CONFIG="/etc/app/config.yaml"
LOG_DIR="/var/log/app"
SOCKET="/run/gunicorn.sock"
BINARY="/usr/local/bin/deploy-tool"

# Existence tests
[[ -e "$CONFIG" ]]   && echo "exists (file or dir)"
[[ -f "$CONFIG" ]]   && echo "is a regular file"
[[ -d "$LOG_DIR" ]]  && echo "is a directory"
[[ -L "$CONFIG" ]]   && echo "is a symlink"
[[ -S "$SOCKET" ]]   && echo "is a socket"

# Permission tests
[[ -r "$CONFIG" ]]   && echo "readable"
[[ -w "$LOG_DIR" ]]  && echo "writable"
[[ -x "$BINARY" ]]   && echo "executable"

# Size test
[[ -s "$CONFIG" ]]   && echo "non-empty file"

# Practical guard: abort if config is missing
if [[ ! -f "$CONFIG" ]]; then
    echo "ERROR: config not found at $CONFIG" >&2
    exit 1
fi

# Practical guard: create log dir if absent
if [[ ! -d "$LOG_DIR" ]]; then
    mkdir -p "$LOG_DIR"
    chmod 750 "$LOG_DIR"
fi

Numeric Comparisons

String equality operators (==, !=) do lexicographic comparison. For numbers use arithmetic operators: -eq, -ne, -lt, -le, -gt, -ge. Alternatively, use an arithmetic context (( )) which treats values as integers and returns exit code 0 for non-zero (true) and 1 for zero (false).

disk_usage=87
max_allowed=80
http_status=503
retry_count=3

# With [[ ]] and arithmetic flags
if [[ "$disk_usage" -gt "$max_allowed" ]]; then
    echo "ALERT: disk usage ${disk_usage}% exceeds threshold" >&2
fi

# With (( )) — arithmetic context, cleaner for math
if (( http_status >= 500 )); then
    echo "Server-side error: $http_status"
fi

if (( retry_count == 0 )); then
    echo "No retries left — failing deployment"
    exit 1
fi

# Compound: disk and inodes both OK
used_inodes=$(df -i /var | awk 'NR==2{print $5}' | tr -d '%')
if (( disk_usage < 90 && used_inodes < 90 )); then
    echo "Disk health OK"
fi

Use (( )) for arithmetic conditions. It is cleaner, supports C-style operators (++, %, **), and avoids the visual noise of -gt/-lt. Just be aware that (( 0 )) returns exit code 1 (falsy) — useful when counting retries down to zero.

The case Statement

When a single variable drives multiple branches, a chain of elif blocks is harder to read and maintain than a case statement. case supports glob patterns, alternation with |, and a catch-all *). It is the idiomatic way to dispatch on environment names, log levels, HTTP methods, or CLI subcommands.

#!/usr/bin/env bash
set -euo pipefail

ENV="${1:-}"         # first argument, empty if not provided
HTTP_METHOD="POST"

# --- Dispatch on environment name ---
case "$ENV" in
    production|prod)
        REPLICAS=10
        LOG_LEVEL="warn"
        ;;
    staging|stage)
        REPLICAS=2
        LOG_LEVEL="info"
        ;;
    development|dev|"")
        REPLICAS=1
        LOG_LEVEL="debug"
        ;;
    *)
        echo "ERROR: unknown environment '$ENV'" >&2
        echo "Usage: $0 {production|staging|development}" >&2
        exit 1
        ;;
esac

echo "Deploying with $REPLICAS replicas, log level=$LOG_LEVEL"

# --- Dispatch on HTTP method (useful in webhook handlers) ---
case "$HTTP_METHOD" in
    GET|HEAD)       echo "Read-only request" ;;
    POST|PUT|PATCH) echo "Mutating request — validate body" ;;
    DELETE)         echo "Destructive request — require confirmation" ;;
    *)              echo "Unknown method: $HTTP_METHOD"; exit 1 ;;
esac

Always include a *) catch-all. A case statement with no catch-all silently does nothing when the value does not match — a common source of bugs where a typo in the environment name causes a deploy to run with no configuration at all. Make the catch-all error loudly and exit non-zero.

Diagram: Conditional Decision Flow in a Deploy Script

A deploy script chains multiple conditional guards before reaching the environment dispatch.

Combining Conditions: Real Patterns

Production scripts rarely test a single condition. Here is a realistic health-check function that combines file tests, numeric comparisons, and string tests in the pattern you will see in large-scale infrastructure code:

#!/usr/bin/env bash
set -euo pipefail

APP_PID_FILE="/run/app/app.pid"
APP_URL="http://localhost:8080/healthz"
MAX_RESPONSE_MS=500

check_health() {
    local pid status elapsed

    # 1. PID file must exist and be non-empty
    if [[ ! -s "$APP_PID_FILE" ]]; then
        echo "CRITICAL: PID file missing or empty" >&2
        return 1
    fi

    pid=$(cat "$APP_PID_FILE")

    # 2. Process must actually be running
    if ! kill -0 "$pid" 2>/dev/null; then
        echo "CRITICAL: process $pid is not running" >&2
        return 1
    fi

    # 3. HTTP health endpoint must return 200
    read -r status elapsed <<< "$(curl -o /dev/null -s \
        -w "%{http_code} %{time_total_ms}" \
        --max-time 2 "$APP_URL")"

    if [[ "$status" != "200" ]]; then
        echo "CRITICAL: healthz returned HTTP $status" >&2
        return 1
    fi

    # 4. Response must be fast enough
    if (( elapsed > MAX_RESPONSE_MS )); then
        echo "WARNING: healthz slow — ${elapsed}ms > ${MAX_RESPONSE_MS}ms" >&2
        return 1
    fi

    echo "OK: app healthy (HTTP $status, ${elapsed}ms)"
}

check_health || { notify_oncall "app health check failed"; exit 1; }

Production pattern — guard functions, not scripts. Wrap each check in a function that returns a meaningful exit code. The caller decides whether to page on-call, retry, or hard-exit. This separation of concerns is how SRE teams write runbooks that can be tested independently.

Quick Reference: Test Operators

String: -z (empty), -n (non-empty), ==, !=, <, > (lexicographic)
Numeric: -eq, -ne, -lt, -le, -gt, -ge — or (( )) with ==, !=, <, >
File: -e (exists), -f (regular file), -d (directory), -s (non-empty), -r/-w/-x (permissions), -L (symlink)
Regex: [[ $var =~ pattern ]] — unquoted POSIX ERE on the right
Negation: ! before any test: [[ ! -f "$file" ]]