JPQL Aggregations & Grouping
JPQL Aggregations & Grouping
Real applications rarely display raw rows. They need totals, averages, and ranked summaries — "how many orders per customer", "what is the average product rating", "which category has the highest revenue". JPQL exposes the same aggregate functions SQL does, but applies them to your entity model rather than raw tables. This lesson covers every aggregate function, GROUP BY, HAVING, and the performance trade-offs that matter in production.
The Five Aggregate Functions
JPQL supports exactly the same set SQL does:
COUNT(x)— number of non-null values; useCOUNT(DISTINCT x)to deduplicate.SUM(x)— arithmetic total of a numeric path.AVG(x)— arithmetic mean, always returnsDouble.MIN(x)/MAX(x)— smallest and largest value; works on numbers, strings, and dates.
Without GROUP BY these functions collapse the entire result set into a single row — a scalar aggregate.
COUNT always returns Long. SUM on an int/long field returns Long; on BigDecimal it returns BigDecimal. AVG always returns Double. Getting the type wrong causes a ClassCastException at runtime, not compile time.
GROUP BY — Aggregates Per Group
GROUP BY splits the result set into buckets before applying the aggregate. Each bucket produces exactly one output row. The rule is the same as SQL: every path in SELECT that is not wrapped in an aggregate function must appear in GROUP BY.
Grouping by a relationship attribute (like o.customer.id) avoids a join, but grouping by the entity itself (e.g., o.customer) triggers an implicit join that Hibernate resolves to the foreign-key column — either form works, but the explicit path to the PK is clearer and marginally faster.
HAVING — Filtering After Grouping
WHERE filters individual rows before grouping. HAVING filters the groups after aggregation. Use HAVING whenever the condition references an aggregate function.
WHERE. The database applies WHERE before grouping, reducing the number of rows that get aggregated. HAVING runs after grouping, so a mistake like moving a non-aggregate filter into HAVING forces the database to aggregate all rows first.
Combining Multiple Aggregates
You can select several aggregates in the same query. A common pattern is the "summary row" that shows totals, averages, and counts for each group simultaneously.
Using DTO Projections with Aggregates
Casting Object[] arrays is error-prone. JPQL supports a NEW constructor expression that packages each row into a typed DTO directly in the query, avoiding manual casting entirely.
The fully-qualified class name is required in the query string. The matching constructor must exist. This pattern is preferred for any aggregate result exposed beyond the repository layer — it gives callers a stable, named type instead of a positional array.
COUNT(e) vs COUNT(e.id) vs COUNT(*)
JPQL does not support COUNT(*). Use COUNT(e) (the entity alias) as the idiomatic equivalent; Hibernate translates it to COUNT(id) automatically. COUNT(e.id) is equivalent and slightly more explicit. Use COUNT(DISTINCT e.someField) when you need to count unique values of a non-PK attribute.
NULL Handling in Aggregates
Like SQL, JPQL aggregates ignore NULL values silently. SUM over an all-null column returns NULL (not zero), and COUNT(field) skips nulls while COUNT(entity) counts every row regardless. Forgetting this causes silent wrong answers in dashboards.
SUM (and AVG) return null, not zero. Always use Optional or a null check when storing the result, or use COALESCE(SUM(x), 0) in the query to guarantee a non-null return value.
Performance Considerations
Aggregate queries execute entirely inside the database engine — no entities are loaded into the JPA first-level cache, no lazy associations are triggered, and no hydration overhead occurs. This makes them dramatically faster than loading entities and summing in Java code. A few production guidelines:
- Index the GROUP BY columns. If you group by
customer_idorcategoryfrequently, those columns should be indexed. Check the query plan withEXPLAIN. - Avoid grouping by non-key entity paths if you can use the FK column — grouping by
o.customermay generate a join whereo.customer.iddoes not. - Use
HAVINGsparingly on large tables — the database must build all groups before filtering. A covering index or a pre-filtered subquery can help when the HAVING filter is selective. - Paginate aggregate results — even grouped queries can return thousands of rows in large systems. Apply
setMaxResults/setFirstResultas you would for any result list.
Summary
JPQL's five aggregate functions — COUNT, SUM, AVG, MIN, MAX — work exactly as in SQL but operate on entity paths. GROUP BY partitions the result into buckets; HAVING filters those buckets on aggregate conditions. Prefer typed DTO constructor expressions over raw Object[], guard against null from SUM/AVG, and let the database do the arithmetic — it is always faster than loading entities into memory and aggregating in Java.