Spring Data JPA

Introduction to Spring Data JPA

18 min Lesson 1 of 13

Introduction to Spring Data JPA

Every application that persists data must answer the same tedious questions: how do I open a connection, map a result set to an object, handle transactions, and close everything safely? Before Spring Data JPA, Java developers either wrote that plumbing by hand with JDBC (as you did in the JDBC tutorial) or managed a EntityManager directly with JPA. Spring Data JPA eliminates virtually all of that boilerplate. You declare what you want to query; the framework writes the how.

The Problem Spring Data JPA Solves

Consider a plain JPA application without Spring Data. To find an Order by customer ID you write something like this:

// plain JPA — no Spring Data EntityManagerFactory emf = Persistence.createEntityManagerFactory("myPU"); EntityManager em = emf.createEntityManager(); TypedQuery<Order> q = em.createQuery( "SELECT o FROM Order o WHERE o.customerId = :cid", Order.class); q.setParameter("cid", customerId); List<Order> orders = q.getResultList(); em.close();

Now imagine doing this for every entity and every query variant in your service layer: find by status, find by date range, count by customer, page through results. The result is hundreds of lines of structurally identical code. Spring Data JPA collapses that to a single interface method declaration.

Where Spring Data JPA Sits in the Stack

Spring Data JPA is not a replacement for JPA or Hibernate — it is a thin abstraction layer on top of them. The full runtime stack looks like this:

  • Your code — calls a Spring Data repository interface.
  • Spring Data JPA — generates the implementation at startup using JDK dynamic proxies.
  • JPA (jakarta.persistence API) — the standard specification that defines EntityManager, @Entity, JPQL, etc.
  • Hibernate 6 — the JPA provider that actually executes SQL against the database.
  • JDBC / HikariCP — the connection pool at the bottom of the stack.
Spring Data JPA vs Hibernate: You configure Hibernate as the JPA provider but you rarely touch it directly when using Spring Data. Spring Data manages the EntityManager lifecycle for you, so most of the Hibernate-specific knowledge you need is limited to mapping annotations and query hints.

Adding Spring Data JPA to a Spring Boot 3 Project

With Spring Boot, one starter pulls in Spring Data JPA, Hibernate 6, and HikariCP in one go:

<!-- pom.xml --> <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-data-jpa</artifactId> </dependency> <!-- a JDBC driver, e.g. PostgreSQL --> <dependency> <groupId>org.postgresql</groupId> <artifactId>postgresql</artifactId> <scope>runtime</scope> </dependency>

Then provide the data-source configuration in application.properties:

spring.datasource.url=jdbc:postgresql://localhost:5432/shop spring.datasource.username=${DB_USER} spring.datasource.password=${DB_PASS} # Hibernate DDL: validate | update | create | create-drop | none spring.jpa.hibernate.ddl-auto=validate # Log the SQL that Hibernate actually sends (useful during development) spring.jpa.show-sql=true spring.jpa.properties.hibernate.format_sql=true
Use ddl-auto=validate in production. It tells Hibernate to verify that the schema matches your entities without touching any data. Reserve create-drop for integration tests only — it drops tables on application shutdown, which is catastrophic if pointed at the wrong database.

Your First Repository — Zero Boilerplate

Here is the complete working example of an entity and its repository:

// Product.java package com.example.shop.domain; import jakarta.persistence.*; @Entity @Table(name = "products") public class Product { @Id @GeneratedValue(strategy = GenerationType.IDENTITY) private Long id; @Column(nullable = false, length = 200) private String name; private java.math.BigDecimal price; protected Product() {} // required by JPA public Product(String name, java.math.BigDecimal price) { this.name = name; this.price = price; } // getters omitted for brevity }
// ProductRepository.java package com.example.shop.repository; import com.example.shop.domain.Product; import org.springframework.data.jpa.repository.JpaRepository; public interface ProductRepository extends JpaRepository<Product, Long> { // no method bodies needed — Spring Data generates them at startup }

That is it. By extending JpaRepository<Product, Long> you inherit eighteen ready-to-use methods including save, findById, findAll, deleteById, and paginated variants. Spring Boot detects the interface on the classpath and wires a fully functional implementation into the application context automatically.

// ProductService.java @Service @RequiredArgsConstructor public class ProductService { private final ProductRepository products; // injected by Spring public Product create(String name, BigDecimal price) { return products.save(new Product(name, price)); } public Optional<Product> findById(Long id) { return products.findById(id); // returns Optional — never null } public List<Product> all() { return products.findAll(); } }

What Happens at Runtime

When the Spring Boot application starts, Spring Data scans for interfaces that extend a Repository marker. For each one it generates a JDK dynamic proxy backed by SimpleJpaRepository — a concrete class that delegates every call to an injected EntityManager. The proxy is registered as a Spring bean, so it participates in transaction management and dependency injection like any other component.

The N+1 query problem is real from day one. Calling findAll() on an entity that has a lazily-loaded collection (e.g. @OneToMany) will fire one SELECT to load the parent rows and then one additional SELECT per row to load the children. On a table with 1,000 rows that is 1,001 queries. Later lessons cover fetch joins, @EntityGraph, and projections to mitigate this — but you need to be aware of it the moment you write your first findAll().

Performance Trade-offs at a Glance

Spring Data JPA is productive, but it is not magic. Here is an honest view of the trade-offs compared to hand-written JDBC:

  • Less code, more safety: Derived query methods are validated against the entity model at startup, not at runtime, so typos fail fast.
  • Object-relational mapping overhead: Hibernate tracks entity state (dirty checking), manages the first-level cache, and translates between object graphs and relational rows. For bulk inserts or analytical queries, plain JDBC or native SQL (covered in Lesson 6) is often faster.
  • Abstraction cost: Understanding what SQL Hibernate actually generates is essential to avoid slow queries. Always enable show-sql during development and review the output with a query profiler before deploying.

Summary

Spring Data JPA sits between your business logic and the JPA/Hibernate stack. It generates repository implementations at startup, exposes a clean interface-based API, and participates fully in Spring's transaction and dependency-injection machinery. The result is dramatically less boilerplate — but effective use requires understanding what SQL is generated under the hood. In the next lesson you will map domain classes to database tables with @Entity and the jakarta.persistence annotations.