File I/O & NIO.2

Byte Streams vs Character Streams

15 min Lesson 6 of 13

Byte Streams vs Character Streams

Java's I/O library is built around two parallel hierarchies: byte streams and character streams. Picking the wrong one is one of the most common causes of garbled text, corrupt binary files, and encoding bugs that only appear on certain operating systems. This lesson explains what each hierarchy does, why both exist, and how to choose between them.

The Two Hierarchies at a Glance

Every I/O class in java.io descends from one of four abstract roots:

InputStream / OutputStream — the byte stream hierarchy.
Reader / Writer — the character stream hierarchy.

Byte streams deal in raw 8-bit bytes — the exact bits on disk or on the wire. Character streams layer Unicode text handling on top, automatically translating between bytes and Java's internal UTF-16 char representation using a Charset.

Byte Streams: InputStream and OutputStream

The central methods are dead simple:

// InputStream
int read()                          // read one byte (0–255), or -1 at EOF
int read(byte[] buf, int off, int len)
void close()

// OutputStream
void write(int b)                   // write low 8 bits of b
void write(byte[] buf, int off, int len)
void flush()
void close()

Because every byte is treated as an opaque value with no encoding interpretation, byte streams are the right tool for:

Image, audio, and video files.
ZIP/JAR archives, compiled class files, PDFs.
Network sockets (TCP data is always bytes).
Any situation where you need full control over the raw bit pattern.

A minimal example — copy a binary file byte-by-byte (buffering is covered in lesson 7; this shows the raw API):

import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;

public class ByteCopy {
    public static void main(String[] args) throws IOException {
        try (var in  = new FileInputStream("photo.jpg");
             var out = new FileOutputStream("photo_copy.jpg")) {

            byte[] buffer = new byte[8192];
            int bytesRead;
            while ((bytesRead = in.read(buffer)) != -1) {
                out.write(buffer, 0, bytesRead);
            }
        }
        System.out.println("Copy complete.");
    }
}

Never use byte streams to read text. read() returns an int holding one raw byte. For a UTF-8 file that contains the euro sign € (3 bytes: 0xE2 0x82 0xAC), you would get three separate integer values — not the character. Character streams handle multi-byte sequences for you.

Character Streams: Reader and Writer

The character stream hierarchy mirrors the byte hierarchy but operates on Java char values (UTF-16 code units):

// Reader
int read()                          // read one char (0–65535), or -1 at EOF
int read(char[] cbuf, int off, int len)
void close()

// Writer
void write(int c)                   // write one char
void write(char[] cbuf, int off, int len)
void write(String s)                // convenience method
void flush()
void close()

The critical bridging classes are InputStreamReader and OutputStreamWriter. They wrap a byte stream and a Charset, performing the byte↔char conversion:

import java.io.*;
import java.nio.charset.StandardCharsets;

public class CharStreamDemo {
    public static void main(String[] args) throws IOException {
        // Write text in UTF-8
        try (var writer = new OutputStreamWriter(
                new FileOutputStream("notes.txt"),
                StandardCharsets.UTF_8)) {
            writer.write("Hello, World!\n");
            writer.write("Price: €49.99\n");       // € is multi-byte in UTF-8
        }

        // Read it back
        try (var reader = new InputStreamReader(
                new FileInputStream("notes.txt"),
                StandardCharsets.UTF_8)) {

            char[] buf = new char[256];
            int n;
            while ((n = reader.read(buf)) != -1) {
                System.out.print(new String(buf, 0, n));
            }
        }
    }
}

In practice you rarely use InputStreamReader and OutputStreamWriter directly — you wrap them in BufferedReader/BufferedWriter or use the convenience factories in Files (covered later). But understanding the bridge is essential for knowing where the encoding conversion actually happens.

Always Specify the Charset — Never Rely on the Platform Default

If you call new FileReader("file.txt") without specifying a charset, Java uses the platform default encoding — which is UTF-8 on modern Linux/macOS but historically CP1252 on Windows. A file written on one machine can be unreadable on another.

FileReader / FileWriter without a charset are platform-dependent traps. Always pass an explicit Charset. Since Java 11 both constructors accept a Charset argument:

// BAD — uses the JVM's default charset; breaks across platforms
var reader = new FileReader("data.txt");

// GOOD — explicit UTF-8 everywhere
var reader = new FileReader("data.txt", StandardCharsets.UTF_8);
var writer = new FileWriter("out.txt",  StandardCharsets.UTF_8);

Concrete Classes You Will Actually Use

Byte stream implementations:

FileInputStream / FileOutputStream — read/write files as raw bytes.
ByteArrayInputStream / ByteArrayOutputStream — in-memory byte buffers (very useful in tests).
DataInputStream / DataOutputStream — read/write Java primitives (int, double, etc.) in a portable binary format.

Character stream implementations:

FileReader / FileWriter — thin wrappers around InputStreamReader / OutputStreamWriter for files.
StringReader / StringWriter — in-memory character buffers backed by a String.
PrintWriter — wraps any Writer and adds println/printf; the output equivalent of System.out.

The Decision Rule

Ask yourself one question: Is this data text that a human could read in a text editor?

Yes (source code, CSV, JSON, log files, HTML) → use a character stream, always with an explicit Charset.
No (images, compiled binaries, encrypted data, compressed archives) → use a byte stream.

NIO.2 shortcuts. In modern code you usually reach for Files.readString(path, charset), Files.writeString(path, text, charset), or Files.newBufferedReader(path, charset) instead of assembling stream chains by hand. These methods handle buffering, encoding, and closing in one call. Under the hood they still use the same Reader/Writer hierarchy — knowing the fundamentals means you can debug them when something goes wrong.

Encoding Bugs in Practice

Here is a deliberately broken example to show what goes wrong when you mix byte and character streams carelessly:

// WRONG: reading a UTF-8 file as raw bytes and casting to char
try (var in = new FileInputStream("arabic.txt")) {
    int b;
    while ((b = in.read()) != -1) {
        System.out.print((char) b);   // corrupts every non-ASCII character
    }
}

// RIGHT: let InputStreamReader handle the encoding
try (var reader = new InputStreamReader(
        new FileInputStream("arabic.txt"),
        StandardCharsets.UTF_8)) {
    int c;
    while ((c = reader.read()) != -1) {
        System.out.print((char) c);   // correct Unicode output
    }
}

Summary

Use byte streams (InputStream/OutputStream family) for any data that is inherently binary. Use character streams (Reader/Writer family) for all text data, and always specify the Charset explicitly — StandardCharsets.UTF_8 is the right default for new code. The InputStreamReader and OutputStreamWriter bridge classes are where encoding conversion happens; every character stream ultimately reads or writes bytes through them.