Byte Streams vs Character Streams
Byte Streams vs Character Streams
Java's I/O library is built around two parallel hierarchies: byte streams and character streams. Picking the wrong one is one of the most common causes of garbled text, corrupt binary files, and encoding bugs that only appear on certain operating systems. This lesson explains what each hierarchy does, why both exist, and how to choose between them.
The Two Hierarchies at a Glance
Every I/O class in java.io descends from one of four abstract roots:
InputStream/OutputStream— the byte stream hierarchy.Reader/Writer— the character stream hierarchy.
Byte streams deal in raw 8-bit bytes — the exact bits on disk or on the wire. Character streams layer Unicode text handling on top, automatically translating between bytes and Java's internal UTF-16 char representation using a Charset.
Byte Streams: InputStream and OutputStream
The central methods are dead simple:
Because every byte is treated as an opaque value with no encoding interpretation, byte streams are the right tool for:
- Image, audio, and video files.
- ZIP/JAR archives, compiled class files, PDFs.
- Network sockets (TCP data is always bytes).
- Any situation where you need full control over the raw bit pattern.
A minimal example — copy a binary file byte-by-byte (buffering is covered in lesson 7; this shows the raw API):
read() returns an int holding one raw byte. For a UTF-8 file that contains the euro sign € (3 bytes: 0xE2 0x82 0xAC), you would get three separate integer values — not the character. Character streams handle multi-byte sequences for you.
Character Streams: Reader and Writer
The character stream hierarchy mirrors the byte hierarchy but operates on Java char values (UTF-16 code units):
The critical bridging classes are InputStreamReader and OutputStreamWriter. They wrap a byte stream and a Charset, performing the byte↔char conversion:
In practice you rarely use InputStreamReader and OutputStreamWriter directly — you wrap them in BufferedReader/BufferedWriter or use the convenience factories in Files (covered later). But understanding the bridge is essential for knowing where the encoding conversion actually happens.
Always Specify the Charset — Never Rely on the Platform Default
If you call new FileReader("file.txt") without specifying a charset, Java uses the platform default encoding — which is UTF-8 on modern Linux/macOS but historically CP1252 on Windows. A file written on one machine can be unreadable on another.
Charset. Since Java 11 both constructors accept a Charset argument:
Concrete Classes You Will Actually Use
Byte stream implementations:
FileInputStream/FileOutputStream— read/write files as raw bytes.ByteArrayInputStream/ByteArrayOutputStream— in-memory byte buffers (very useful in tests).DataInputStream/DataOutputStream— read/write Java primitives (int,double, etc.) in a portable binary format.
Character stream implementations:
FileReader/FileWriter— thin wrappers aroundInputStreamReader/OutputStreamWriterfor files.StringReader/StringWriter— in-memory character buffers backed by aString.PrintWriter— wraps anyWriterand addsprintln/printf; the output equivalent ofSystem.out.
The Decision Rule
Ask yourself one question: Is this data text that a human could read in a text editor?
- Yes (source code, CSV, JSON, log files, HTML) → use a character stream, always with an explicit
Charset. - No (images, compiled binaries, encrypted data, compressed archives) → use a byte stream.
Files.readString(path, charset), Files.writeString(path, text, charset), or Files.newBufferedReader(path, charset) instead of assembling stream chains by hand. These methods handle buffering, encoding, and closing in one call. Under the hood they still use the same Reader/Writer hierarchy — knowing the fundamentals means you can debug them when something goes wrong.
Encoding Bugs in Practice
Here is a deliberately broken example to show what goes wrong when you mix byte and character streams carelessly:
Summary
Use byte streams (InputStream/OutputStream family) for any data that is inherently binary. Use character streams (Reader/Writer family) for all text data, and always specify the Charset explicitly — StandardCharsets.UTF_8 is the right default for new code. The InputStreamReader and OutputStreamWriter bridge classes are where encoding conversion happens; every character stream ultimately reads or writes bytes through them.