Linux Fundamentals

Working with Files & Text

18 min Lesson 4 of 26

Working with Files & Text

The terminal is a text-first environment. Configuration files, log files, scripts, and data pipelines all live as plain text on disk. A DevOps engineer who cannot fluently read, inspect, copy, move, and manipulate text files from the shell is blocked on nearly every operational task. This lesson teaches the canonical toolkit — cat, less, head, tail, cp, mv, rm, touch, and file — with the production mindset that separates professionals from beginners.

Reading File Contents

There are four primary tools for reading files, each suited to a different situation.

cat (concatenate) dumps the entire file to stdout. It is correct for small config files, quick inspections, and piping content into other tools. Avoid it on files larger than a few thousand lines — your terminal scrollback will fill up and you will lose context.

# Dump a config to stdout cat /etc/nginx/nginx.conf # Number every line — useful when debugging config errors cat -n /etc/ssh/sshd_config # Concatenate two files into a third (classic use case) cat header.txt body.txt footer.txt > full_report.txt # Squeeze multiple blank lines into one cat -s nginx.conf

less is a pager: it shows one screenful at a time, lets you scroll up and down, and never loads the whole file into memory. It is the correct tool for any file you do not already know the size of. Key controls: j/k or arrow keys to scroll, Space/b for page down/up, /pattern to search forward, n/N to jump between matches, q to quit, F to follow new content (like tail -f).

# Page through a large log file less /var/log/nginx/access.log # Open with line numbers shown less -N /etc/nginx/nginx.conf # Search for ERROR as soon as the file opens (+/PATTERN) less +/ERROR /var/log/app/app.log # Follow a growing file (Shift+F inside less does the same) less +F /var/log/syslog
Production habit: prefer less over cat for any log file or config you have not seen before. On a busy production server, cat /var/log/nginx/access.log can scroll thousands of lines past you in a second. less keeps you in control.

head and tail read the beginning or end of a file. Both default to 10 lines; -n N overrides that.

  • head -n 20 config.yaml — see the first 20 lines (great for verifying a header or schema)
  • tail -n 50 /var/log/app.log — see the last 50 lines (recent events)
  • tail -f /var/log/nginx/error.logfollow mode; new lines stream to your terminal in real time as the file grows — the single most important log-watching technique in production
  • tail -F (capital F) follows by filename and handles log rotation, making it more robust than lowercase -f in long-running sessions
# Watch application errors in real time tail -F /var/log/app/error.log # Show the last 100 lines of a systemd journal for nginx journalctl -u nginx -n 100 # Head: confirm a CSV has the right columns before processing head -n 5 data_export.csv # Combine head and tail to extract a middle section head -n 100 large_file.log | tail -n 20

Inspecting File Type and Metadata

file probes the actual content of a file (magic bytes, encoding, script shebang) and reports what it truly is — regardless of the file extension. This matters constantly in DevOps: you receive a binary with a .log extension, a gzipped file named backup, or a script with no extension at all.

file /usr/bin/bash # bash: ELF 64-bit LSB pie executable, x86-64 file /var/log/syslog # /var/log/syslog: UTF-8 Unicode text file backup.tar.gz # backup.tar.gz: gzip compressed data, from Unix file deploy.sh # deploy.sh: Bourne-Again shell script, ASCII text executable file unknown_binary # unknown_binary: ELF 64-bit LSB shared object -- do NOT cat this
Never cat a binary file to your terminal. Terminals interpret escape sequences in the byte stream. A crafted binary can send sequences that remap your keyboard, reset your terminal title, or in older terminals trigger command execution. Always run file on an unknown file before reading it.

Copying, Moving, and Renaming Files

cp vs mv vs rm file operations source.conf /etc/app/ cp backup.conf source stays intact mv renamed.conf source is gone rm deleted no recycle bin touch newfile create / update mtime cp preserves source · mv does not · rm is permanent · touch creates or updates
Core file operations: cp (copy), mv (move/rename), rm (delete), touch (create/update timestamp).

cp copies files or directories. The source is preserved.

  • cp file dest — copy file to destination
  • cp -r dir/ backup/ — copy a directory recursively
  • cp -p — preserve permissions, ownership, and timestamps (critical when copying config files)
  • cp -a — archive mode: recursive + preserve everything (the right default for backups)

mv moves or renames. There is no copy — the original path disappears. On the same filesystem this is instantaneous (just a metadata update); across filesystems it copies then deletes.

rm deletes permanently — there is no recycle bin on Linux. Common flags: -r to remove directories recursively, -f to suppress "no such file" errors. The combination rm -rf is extremely powerful and frequently the cause of production disasters.

The rm -rf trap. A misplaced space or a variable expansion to empty in a script can wipe an entire filesystem. Production-safe habits: always double-check the path with ls first; use rm -ri (interactive) on unfamiliar directories; never run rm -rf /some/path/$VAR unless you know $VAR cannot be empty. Google has lost data this way. So has every major cloud provider at least once.

touch creates an empty file if it does not exist, or updates the access and modification timestamps if it does. In DevOps it is commonly used to create lock files, trigger file-watching tools, or create placeholder files in Git repos.

# Backup a config before editing (always do this) cp -p /etc/nginx/nginx.conf /etc/nginx/nginx.conf.bak # Copy a directory of configs to a staging area cp -a /etc/app/conf.d/ /tmp/conf-backup/ # Rename a file in place (same directory) mv app.conf.disabled app.conf # Move a log archive off the primary disk mv /var/log/app/archive/ /mnt/cold-storage/logs/ # Create an empty flag file used by a deploy script touch /var/run/app/deploy.lock # Remove a single stale lock file rm /var/run/app/deploy.lock # Remove a directory tree — VERIFY the path first ls /tmp/old-deploy-artifacts/ rm -rf /tmp/old-deploy-artifacts/

Text Streams and Combining Tools

The real power of these commands emerges when you combine them via pipes. A production engineer rarely reads a raw log file — they filter, search, and summarize it on the fly. The tools in this lesson are foundational building blocks.

# Find all HTTP 500 errors in today's access log and count them grep " 500 " /var/log/nginx/access.log | wc -l # Show the 20 most recent 500 errors, formatted tail -n 10000 /var/log/nginx/access.log | grep " 500 " | tail -n 20 # Confirm a newly deployed config file looks right before reloading head -n 40 /etc/nginx/sites-enabled/app.conf # Compare original and edited config diff /etc/nginx/nginx.conf.bak /etc/nginx/nginx.conf # Find all config files in /etc/app/ (not just .conf — use file to verify) find /etc/app/ -type f | xargs file | grep "ASCII text"
Config backup is not optional. Before editing any production configuration file, always cp -p original original.bak. This single habit has saved countless engineers from having to restore from a full backup at 2 AM. Many teams also keep config under version control (Git) so every change is auditable — that is the professional standard covered in the Git tutorial.

What You Now Know

You can read any file safely (less, head, tail -F), identify what a file actually is (file), copy and back up configs without destroying the original (cp -p), rename and move files atomically (mv), and remove files deliberately and safely (rm). These primitives underpin every shell script, every automation pipeline, and every production incident response you will ever run.