Batch Processing
Batch Processing
Every JDBC call you make travels across a network socket to the database server, is parsed, planned, executed, and then the result is sent back. When you need to insert or update thousands of rows, paying that round-trip cost per row is catastrophically slow. Batch processing solves this: you queue several SQL statements on the client side and send them to the server in a single network message, letting the database execute them as a group.
The Problem with Row-by-Row Execution
Consider inserting 10,000 audit records one at a time:
On a local database this might take a few seconds. Over a real network (even within the same data centre) it can take minutes. Batching compresses those 10,000 trips into a handful.
addBatch and executeBatch
JDBC exposes two methods on both Statement and PreparedStatement:
addBatch()— stages the current parameter set (or a full SQL string onStatement) without sending anything yet.executeBatch()— flushes all staged commands to the server in one call and returns anint[]of update counts, one per command.
addBatch call (behaviour is driver-dependent), and a half-applied batch leaves data in an inconsistent state. Wrapping the whole batch in a single explicit transaction is the only safe approach.
Chunking Large Batches
Staging hundreds of thousands of rows before a single executeBatch() consumes a lot of client-side memory and produces an enormous single database transaction. The standard pattern is to flush at a fixed chunk size:
Interpreting the int[] Return Value
executeBatch() returns one int per staged command. The value can be:
- A positive number — the exact update count for that command.
Statement.SUCCESS_NO_INFO(-2) — the command succeeded but the driver cannot report how many rows changed (common with MySQL/MariaDB in batch mode).Statement.EXECUTE_FAILED(-3) — the command failed (only possible when the driver does not throwBatchUpdateExceptioneagerly).
BatchUpdateException — Partial Failure
If one command in the batch fails, the driver throws BatchUpdateException (a subclass of SQLException). Crucially, some commands may have already executed before the failure. Call ex.getUpdateCounts() to see which ones succeeded:
continueBatchOnError=true) keep going and mark failed entries with EXECUTE_FAILED. Always read your driver documentation and wrap every batch in an explicit transaction so the rollback path is unambiguous regardless of driver behaviour.
Batching with Statement (Dynamic SQL)
You can also batch arbitrary SQL strings on a plain Statement. This is rarely the right choice — it bypasses parameterisation and opens the door to SQL injection — but it is useful for one-off DDL scripts or data migration jobs where the SQL is generated from trusted internal sources:
PreparedStatement for batching whenever the SQL shape is fixed and only the data changes. It avoids SQL injection, lets the database reuse the execution plan across all rows in the batch, and is typically faster than re-parsing a statement string for every row.
Performance in Practice
The improvement from batching is dramatic. Here are typical order-of-magnitude benchmarks when inserting 50,000 rows over a local JDBC connection:
- Row-by-row
executeUpdate(): ~12 seconds - Batched with chunk size 500: ~0.8 seconds
- Batched with chunk size 2,000: ~0.5 seconds
The actual numbers depend on network latency, database load, row size, and the driver. The relative improvement is consistent: batching almost always yields a 10x–50x speedup for bulk write workloads.
Summary
Use addBatch() to stage parameter sets without sending them, then flush the accumulated work with executeBatch(). Wrap the whole operation in a single explicit transaction and always handle BatchUpdateException with a rollback. For large datasets, chunk the batch into fixed-size groups to control memory use and lock duration. Prefer PreparedStatement over Statement batching to keep the SQL parameterised and let the database reuse its execution plan.