JDBC & Databases

Batch Processing

15 min Lesson 8 of 13

Batch Processing

Every JDBC call you make travels across a network socket to the database server, is parsed, planned, executed, and then the result is sent back. When you need to insert or update thousands of rows, paying that round-trip cost per row is catastrophically slow. Batch processing solves this: you queue several SQL statements on the client side and send them to the server in a single network message, letting the database execute them as a group.

The Problem with Row-by-Row Execution

Consider inserting 10,000 audit records one at a time:

// Anti-pattern: one round-trip per insert
try (PreparedStatement ps = conn.prepareStatement(
        "INSERT INTO audit_log (event, ts) VALUES (?, ?)")) {
    for (AuditEvent e : events) {            // 10,000 iterations
        ps.setString(1, e.event());
        ps.setTimestamp(2, Timestamp.from(e.ts()));
        ps.executeUpdate();                  // network round-trip every time
    }
}

On a local database this might take a few seconds. Over a real network (even within the same data centre) it can take minutes. Batching compresses those 10,000 trips into a handful.

addBatch and executeBatch

JDBC exposes two methods on both Statement and PreparedStatement:

addBatch() — stages the current parameter set (or a full SQL string on Statement) without sending anything yet.
executeBatch() — flushes all staged commands to the server in one call and returns an int[] of update counts, one per command.

import java.sql.*;
import java.time.Instant;
import java.util.List;

public class BatchInsertDemo {

    record AuditEvent(String event, Instant ts) {}

    public static void insertAuditLogs(Connection conn, List<AuditEvent> events)
            throws SQLException {

        final String sql = "INSERT INTO audit_log (event, ts) VALUES (?, ?)";

        // Turn off auto-commit so the whole batch is one transaction
        conn.setAutoCommit(false);

        try (PreparedStatement ps = conn.prepareStatement(sql)) {
            for (AuditEvent e : events) {
                ps.setString(1, e.event());
                ps.setTimestamp(2, Timestamp.from(e.ts()));
                ps.addBatch();                  // stage — no network call yet
            }

            int[] counts = ps.executeBatch();   // one round-trip for all rows
            conn.commit();

            System.out.println("Rows affected: " + sumCounts(counts));

        } catch (BatchUpdateException ex) {
            conn.rollback();
            // ex.getUpdateCounts() tells you which commands succeeded before failure
            throw ex;
        } catch (SQLException ex) {
            conn.rollback();
            throw ex;
        } finally {
            conn.setAutoCommit(true);
        }
    }

    private static int sumCounts(int[] counts) {
        int total = 0;
        for (int c : counts) {
            if (c >= 0) total += c;           // SUCCESS_NO_INFO == -2, ignore it
        }
        return total;
    }
}

Always disable auto-commit before a batch. With auto-commit on, the driver may commit after each addBatch call (behaviour is driver-dependent), and a half-applied batch leaves data in an inconsistent state. Wrapping the whole batch in a single explicit transaction is the only safe approach.

Chunking Large Batches

Staging hundreds of thousands of rows before a single executeBatch() consumes a lot of client-side memory and produces an enormous single database transaction. The standard pattern is to flush at a fixed chunk size:

public static void insertInChunks(Connection conn, List<AuditEvent> events,
                                   int chunkSize) throws SQLException {
    final String sql = "INSERT INTO audit_log (event, ts) VALUES (?, ?)";
    conn.setAutoCommit(false);

    try (PreparedStatement ps = conn.prepareStatement(sql)) {
        int i = 0;
        for (AuditEvent e : events) {
            ps.setString(1, e.event());
            ps.setTimestamp(2, Timestamp.from(e.ts()));
            ps.addBatch();

            if (++i % chunkSize == 0) {
                ps.executeBatch();          // flush this chunk
                conn.commit();
                ps.clearBatch();            // optional — most drivers clear automatically
            }
        }

        // flush any remainder
        ps.executeBatch();
        conn.commit();

    } catch (SQLException ex) {
        conn.rollback();
        throw ex;
    } finally {
        conn.setAutoCommit(true);
    }
}

Choosing a chunk size: 500–1,000 rows per flush is a common starting point. Too small and you negate the batching benefit; too large and you risk memory pressure and long lock durations. Profile against your actual data volume and network latency to find the sweet spot.

Interpreting the int[] Return Value

executeBatch() returns one int per staged command. The value can be:

A positive number — the exact update count for that command.
Statement.SUCCESS_NO_INFO (-2) — the command succeeded but the driver cannot report how many rows changed (common with MySQL/MariaDB in batch mode).
Statement.EXECUTE_FAILED (-3) — the command failed (only possible when the driver does not throw BatchUpdateException eagerly).

int[] counts = ps.executeBatch();
for (int idx = 0; idx < counts.length; idx++) {
    switch (counts[idx]) {
        case Statement.SUCCESS_NO_INFO ->
            System.out.printf("Command %d: succeeded (count unknown)%n", idx);
        case Statement.EXECUTE_FAILED ->
            System.out.printf("Command %d: FAILED%n", idx);
        default ->
            System.out.printf("Command %d: %d row(s) affected%n", idx, counts[idx]);
    }
}

BatchUpdateException — Partial Failure

If one command in the batch fails, the driver throws BatchUpdateException (a subclass of SQLException). Crucially, some commands may have already executed before the failure. Call ex.getUpdateCounts() to see which ones succeeded:

} catch (BatchUpdateException ex) {
    conn.rollback();                  // undo whatever already ran
    int[] partial = ex.getUpdateCounts();
    for (int idx = 0; idx < partial.length; idx++) {
        if (partial[idx] == Statement.EXECUTE_FAILED) {
            System.err.println("Failed at index " + idx);
        }
    }
    throw ex;
}

Driver behaviour varies. Some drivers (e.g. PostgreSQL JDBC) abort the entire batch at the first failure and roll back implicitly. Others (e.g. MySQL Connector/J with continueBatchOnError=true) keep going and mark failed entries with EXECUTE_FAILED. Always read your driver documentation and wrap every batch in an explicit transaction so the rollback path is unambiguous regardless of driver behaviour.

Batching with Statement (Dynamic SQL)

You can also batch arbitrary SQL strings on a plain Statement. This is rarely the right choice — it bypasses parameterisation and opens the door to SQL injection — but it is useful for one-off DDL scripts or data migration jobs where the SQL is generated from trusted internal sources:

try (Statement st = conn.createStatement()) {
    conn.setAutoCommit(false);

    st.addBatch("UPDATE products SET archived = true WHERE created_at < '2020-01-01'");
    st.addBatch("DELETE FROM sessions WHERE expires_at < NOW()");
    st.addBatch("UPDATE counters SET value = 0 WHERE name = 'daily_hits'");

    int[] counts = st.executeBatch();
    conn.commit();
}

Prefer PreparedStatement for batching whenever the SQL shape is fixed and only the data changes. It avoids SQL injection, lets the database reuse the execution plan across all rows in the batch, and is typically faster than re-parsing a statement string for every row.

Performance in Practice

The improvement from batching is dramatic. Here are typical order-of-magnitude benchmarks when inserting 50,000 rows over a local JDBC connection:

Row-by-row executeUpdate(): ~12 seconds
Batched with chunk size 500: ~0.8 seconds
Batched with chunk size 2,000: ~0.5 seconds

The actual numbers depend on network latency, database load, row size, and the driver. The relative improvement is consistent: batching almost always yields a 10x–50x speedup for bulk write workloads.

Summary

Use addBatch() to stage parameter sets without sending them, then flush the accumulated work with executeBatch(). Wrap the whole operation in a single explicit transaction and always handle BatchUpdateException with a rollback. For large datasets, chunk the batch into fixed-size groups to control memory use and lock duration. Prefer PreparedStatement over Statement batching to keep the SQL parameterised and let the database reuse its execution plan.