JPQL, Criteria API & Queries

JPQL Aggregations & Grouping

18 min Lesson 3 of 13

JPQL Aggregations & Grouping

Real applications rarely display raw rows. They need totals, averages, and ranked summaries — "how many orders per customer", "what is the average product rating", "which category has the highest revenue". JPQL exposes the same aggregate functions SQL does, but applies them to your entity model rather than raw tables. This lesson covers every aggregate function, GROUP BY, HAVING, and the performance trade-offs that matter in production.

The Five Aggregate Functions

JPQL supports exactly the same set SQL does:

COUNT(x) — number of non-null values; use COUNT(DISTINCT x) to deduplicate.
SUM(x) — arithmetic total of a numeric path.
AVG(x) — arithmetic mean, always returns Double.
MIN(x) / MAX(x) — smallest and largest value; works on numbers, strings, and dates.

Without GROUP BY these functions collapse the entire result set into a single row — a scalar aggregate.

// Scalar aggregate — total revenue across ALL orders
@Repository
public class OrderStatsRepository {

    @PersistenceContext
    private EntityManager em;

    public Double totalRevenue() {
        return em.createQuery(
                "SELECT SUM(o.totalAmount) FROM Order o WHERE o.status = 'PAID'",
                Double.class)
            .getSingleResult();
    }

    public Long countActiveCustomers() {
        return em.createQuery(
                "SELECT COUNT(DISTINCT o.customer.id) FROM Order o WHERE o.status = 'PAID'",
                Long.class)
            .getSingleResult();
    }
}

Return types matter. COUNT always returns Long. SUM on an int/long field returns Long; on BigDecimal it returns BigDecimal. AVG always returns Double. Getting the type wrong causes a ClassCastException at runtime, not compile time.

GROUP BY — Aggregates Per Group

GROUP BY splits the result set into buckets before applying the aggregate. Each bucket produces exactly one output row. The rule is the same as SQL: every path in SELECT that is not wrapped in an aggregate function must appear in GROUP BY.

// How many orders did each customer place?
// Returns Object[] { customerId, Long count }
List<Object[]> rows = em.createQuery(
        "SELECT o.customer.id, COUNT(o) " +
        "FROM Order o " +
        "GROUP BY o.customer.id",
        Object[].class)
    .getResultList();

for (Object[] row : rows) {
    Long customerId = (Long) row[0];
    Long orderCount = (Long) row[1];
    System.out.println(customerId + " placed " + orderCount + " orders");
}

Grouping by a relationship attribute (like o.customer.id) avoids a join, but grouping by the entity itself (e.g., o.customer) triggers an implicit join that Hibernate resolves to the foreign-key column — either form works, but the explicit path to the PK is clearer and marginally faster.

HAVING — Filtering After Grouping

WHERE filters individual rows before grouping. HAVING filters the groups after aggregation. Use HAVING whenever the condition references an aggregate function.

// Customers who have placed MORE than 5 orders
List<Object[]> vipCustomers = em.createQuery(
        "SELECT o.customer.id, COUNT(o) AS orderCount " +
        "FROM Order o " +
        "GROUP BY o.customer.id " +
        "HAVING COUNT(o) > 5 " +
        "ORDER BY orderCount DESC",
        Object[].class)
    .getResultList();

Push work into WHERE, not HAVING. Put every condition that does not need an aggregate into WHERE. The database applies WHERE before grouping, reducing the number of rows that get aggregated. HAVING runs after grouping, so a mistake like moving a non-aggregate filter into HAVING forces the database to aggregate all rows first.

Combining Multiple Aggregates

You can select several aggregates in the same query. A common pattern is the "summary row" that shows totals, averages, and counts for each group simultaneously.

// Revenue summary per product category
// Returns Object[] { category, totalRevenue, avgRevenue, orderCount }
List<Object[]> summary = em.createQuery(
        "SELECT p.category, " +
        "       SUM(oi.quantity * oi.unitPrice), " +
        "       AVG(oi.unitPrice), " +
        "       COUNT(oi) " +
        "FROM OrderItem oi " +
        "JOIN oi.product p " +
        "GROUP BY p.category " +
        "HAVING SUM(oi.quantity * oi.unitPrice) > 1000.0 " +
        "ORDER BY SUM(oi.quantity * oi.unitPrice) DESC",
        Object[].class)
    .getResultList();

Using DTO Projections with Aggregates

Casting Object[] arrays is error-prone. JPQL supports a NEW constructor expression that packages each row into a typed DTO directly in the query, avoiding manual casting entirely.

// DTO record — Java 16+
public record CategorySummary(
    String category,
    Double totalRevenue,
    Double avgPrice,
    Long itemCount
) {}

// Query with constructor expression
List<CategorySummary> results = em.createQuery(
        "SELECT NEW com.example.dto.CategorySummary(" +
        "    p.category, " +
        "    SUM(oi.quantity * oi.unitPrice), " +
        "    AVG(oi.unitPrice), " +
        "    COUNT(oi)" +
        ") " +
        "FROM OrderItem oi JOIN oi.product p " +
        "GROUP BY p.category",
        CategorySummary.class)
    .getResultList();

The fully-qualified class name is required in the query string. The matching constructor must exist. This pattern is preferred for any aggregate result exposed beyond the repository layer — it gives callers a stable, named type instead of a positional array.

COUNT(e) vs COUNT(e.id) vs COUNT(*)

JPQL does not support COUNT(*). Use COUNT(e) (the entity alias) as the idiomatic equivalent; Hibernate translates it to COUNT(id) automatically. COUNT(e.id) is equivalent and slightly more explicit. Use COUNT(DISTINCT e.someField) when you need to count unique values of a non-PK attribute.

NULL Handling in Aggregates

Like SQL, JPQL aggregates ignore NULL values silently. SUM over an all-null column returns NULL (not zero), and COUNT(field) skips nulls while COUNT(entity) counts every row regardless. Forgetting this causes silent wrong answers in dashboards.

SUM can return null. If no rows match the WHERE clause, SUM (and AVG) return null, not zero. Always use Optional or a null check when storing the result, or use COALESCE(SUM(x), 0) in the query to guarantee a non-null return value.

Performance Considerations

Aggregate queries execute entirely inside the database engine — no entities are loaded into the JPA first-level cache, no lazy associations are triggered, and no hydration overhead occurs. This makes them dramatically faster than loading entities and summing in Java code. A few production guidelines:

Index the GROUP BY columns. If you group by customer_id or category frequently, those columns should be indexed. Check the query plan with EXPLAIN.
Avoid grouping by non-key entity paths if you can use the FK column — grouping by o.customer may generate a join where o.customer.id does not.
Use HAVING sparingly on large tables — the database must build all groups before filtering. A covering index or a pre-filtered subquery can help when the HAVING filter is selective.
Paginate aggregate results — even grouped queries can return thousands of rows in large systems. Apply setMaxResults / setFirstResult as you would for any result list.

Summary

JPQL's five aggregate functions — COUNT, SUM, AVG, MIN, MAX — work exactly as in SQL but operate on entity paths. GROUP BY partitions the result into buckets; HAVING filters those buckets on aggregate conditions. Prefer typed DTO constructor expressions over raw Object[], guard against null from SUM/AVG, and let the database do the arithmetic — it is always faster than loading entities into memory and aggregating in Java.