Linux Fundamentals

Working with Files & Text

18 min Lesson 4 of 26

Working with Files & Text

The terminal is a text-first environment. Configuration files, log files, scripts, and data pipelines all live as plain text on disk. A DevOps engineer who cannot fluently read, inspect, copy, move, and manipulate text files from the shell is blocked on nearly every operational task. This lesson teaches the canonical toolkit — cat, less, head, tail, cp, mv, rm, touch, and file — with the production mindset that separates professionals from beginners.

Reading File Contents

There are four primary tools for reading files, each suited to a different situation.

cat (concatenate) dumps the entire file to stdout. It is correct for small config files, quick inspections, and piping content into other tools. Avoid it on files larger than a few thousand lines — your terminal scrollback will fill up and you will lose context.

# Dump a config to stdout
cat /etc/nginx/nginx.conf

# Number every line — useful when debugging config errors
cat -n /etc/ssh/sshd_config

# Concatenate two files into a third (classic use case)
cat header.txt body.txt footer.txt > full_report.txt

# Squeeze multiple blank lines into one
cat -s nginx.conf

less is a pager: it shows one screenful at a time, lets you scroll up and down, and never loads the whole file into memory. It is the correct tool for any file you do not already know the size of. Key controls: j/k or arrow keys to scroll, Space/b for page down/up, /pattern to search forward, n/N to jump between matches, q to quit, F to follow new content (like tail -f).

# Page through a large log file
less /var/log/nginx/access.log

# Open with line numbers shown
less -N /etc/nginx/nginx.conf

# Search for ERROR as soon as the file opens (+/PATTERN)
less +/ERROR /var/log/app/app.log

# Follow a growing file (Shift+F inside less does the same)
less +F /var/log/syslog

Production habit: prefer less over cat for any log file or config you have not seen before. On a busy production server, cat /var/log/nginx/access.log can scroll thousands of lines past you in a second. less keeps you in control.

head and tail read the beginning or end of a file. Both default to 10 lines; -n N overrides that.

head -n 20 config.yaml — see the first 20 lines (great for verifying a header or schema)
tail -n 50 /var/log/app.log — see the last 50 lines (recent events)
tail -f /var/log/nginx/error.log — follow mode; new lines stream to your terminal in real time as the file grows — the single most important log-watching technique in production
tail -F (capital F) follows by filename and handles log rotation, making it more robust than lowercase -f in long-running sessions

# Watch application errors in real time
tail -F /var/log/app/error.log

# Show the last 100 lines of a systemd journal for nginx
journalctl -u nginx -n 100

# Head: confirm a CSV has the right columns before processing
head -n 5 data_export.csv

# Combine head and tail to extract a middle section
head -n 100 large_file.log | tail -n 20

Inspecting File Type and Metadata

file probes the actual content of a file (magic bytes, encoding, script shebang) and reports what it truly is — regardless of the file extension. This matters constantly in DevOps: you receive a binary with a .log extension, a gzipped file named backup, or a script with no extension at all.

file /usr/bin/bash
# bash: ELF 64-bit LSB pie executable, x86-64

file /var/log/syslog
# /var/log/syslog: UTF-8 Unicode text

file backup.tar.gz
# backup.tar.gz: gzip compressed data, from Unix

file deploy.sh
# deploy.sh: Bourne-Again shell script, ASCII text executable

file unknown_binary
# unknown_binary: ELF 64-bit LSB shared object -- do NOT cat this

Never cat a binary file to your terminal. Terminals interpret escape sequences in the byte stream. A crafted binary can send sequences that remap your keyboard, reset your terminal title, or in older terminals trigger command execution. Always run file on an unknown file before reading it.

Copying, Moving, and Renaming Files

Core file operations: cp (copy), mv (move/rename), rm (delete), touch (create/update timestamp).

cp copies files or directories. The source is preserved.

cp file dest — copy file to destination
cp -r dir/ backup/ — copy a directory recursively
cp -p — preserve permissions, ownership, and timestamps (critical when copying config files)
cp -a — archive mode: recursive + preserve everything (the right default for backups)

mv moves or renames. There is no copy — the original path disappears. On the same filesystem this is instantaneous (just a metadata update); across filesystems it copies then deletes.

rm deletes permanently — there is no recycle bin on Linux. Common flags: -r to remove directories recursively, -f to suppress "no such file" errors. The combination rm -rf is extremely powerful and frequently the cause of production disasters.

The rm -rf trap. A misplaced space or a variable expansion to empty in a script can wipe an entire filesystem. Production-safe habits: always double-check the path with ls first; use rm -ri (interactive) on unfamiliar directories; never run rm -rf /some/path/$VAR unless you know $VAR cannot be empty. Google has lost data this way. So has every major cloud provider at least once.

touch creates an empty file if it does not exist, or updates the access and modification timestamps if it does. In DevOps it is commonly used to create lock files, trigger file-watching tools, or create placeholder files in Git repos.

# Backup a config before editing (always do this)
cp -p /etc/nginx/nginx.conf /etc/nginx/nginx.conf.bak

# Copy a directory of configs to a staging area
cp -a /etc/app/conf.d/ /tmp/conf-backup/

# Rename a file in place (same directory)
mv app.conf.disabled app.conf

# Move a log archive off the primary disk
mv /var/log/app/archive/ /mnt/cold-storage/logs/

# Create an empty flag file used by a deploy script
touch /var/run/app/deploy.lock

# Remove a single stale lock file
rm /var/run/app/deploy.lock

# Remove a directory tree — VERIFY the path first
ls /tmp/old-deploy-artifacts/
rm -rf /tmp/old-deploy-artifacts/

Text Streams and Combining Tools

The real power of these commands emerges when you combine them via pipes. A production engineer rarely reads a raw log file — they filter, search, and summarize it on the fly. The tools in this lesson are foundational building blocks.

# Find all HTTP 500 errors in today's access log and count them
grep " 500 " /var/log/nginx/access.log | wc -l

# Show the 20 most recent 500 errors, formatted
tail -n 10000 /var/log/nginx/access.log | grep " 500 " | tail -n 20

# Confirm a newly deployed config file looks right before reloading
head -n 40 /etc/nginx/sites-enabled/app.conf

# Compare original and edited config
diff /etc/nginx/nginx.conf.bak /etc/nginx/nginx.conf

# Find all config files in /etc/app/ (not just .conf — use file to verify)
find /etc/app/ -type f | xargs file | grep "ASCII text"

Config backup is not optional. Before editing any production configuration file, always cp -p original original.bak. This single habit has saved countless engineers from having to restore from a full backup at 2 AM. Many teams also keep config under version control (Git) so every change is auditable — that is the professional standard covered in the Git tutorial.

What You Now Know

You can read any file safely (less, head, tail -F), identify what a file actually is (file), copy and back up configs without destroying the original (cp -p), rename and move files atomically (mv), and remove files deliberately and safely (rm). These primitives underpin every shell script, every automation pipeline, and every production incident response you will ever run.