The Streams API

Collectors in Depth

15 min Lesson 7 of 13

Collectors in Depth

By the time you reach this lesson you already know how to call collect(Collectors.toList()) to materialise a stream into a list. That is just the surface. The java.util.stream.Collectors utility class ships with over twenty factory methods, and four of them — groupingBy, partitioningBy, joining, and counting — cover the majority of real-world aggregation work. Mastering them lets you replace loops that span a dozen lines with a single, readable expression.

counting — the simplest aggregation

Collectors.counting() is a downstream collector that counts the elements flowing into it. On its own it is not very interesting — you would just call stream.count() — but it becomes powerful when composed inside another collector.

import java.util.List; import java.util.stream.Collectors; List<String> words = List.of("apple", "fig", "banana", "avocado", "blueberry", "date"); long total = words.stream().collect(Collectors.counting()); System.out.println(total); // 6

You will see counting() again when we look at groupingBy.

groupingBy — splitting a stream into buckets

Collectors.groupingBy(classifier) partitions stream elements into a Map<K, List<V>> where every element that produces the same key lands in the same bucket.

import java.util.List; import java.util.Map; import java.util.stream.Collectors; record Employee(String name, String department, double salary) {} List<Employee> employees = List.of( new Employee("Alice", "Engineering", 95_000), new Employee("Bob", "Engineering", 88_000), new Employee("Carol", "Marketing", 72_000), new Employee("Dave", "Marketing", 68_000), new Employee("Eve", "HR", 61_000) ); Map<String, List<Employee>> byDept = employees.stream() .collect(Collectors.groupingBy(Employee::department)); byDept.forEach((dept, list) -> System.out.println(dept + ": " + list.stream() .map(Employee::name) .toList())); // Engineering: [Alice, Bob] // Marketing: [Carol, Dave] // HR: [Eve]

The real power arrives when you add a downstream collector as a second argument. Instead of collecting the bucket members into a list, you can aggregate them further:

// Count employees per department Map<String, Long> countByDept = employees.stream() .collect(Collectors.groupingBy( Employee::department, Collectors.counting() )); // {Engineering=2, Marketing=2, HR=1} // Average salary per department Map<String, Double> avgSalaryByDept = employees.stream() .collect(Collectors.groupingBy( Employee::department, Collectors.averagingDouble(Employee::salary) )); // {Engineering=91500.0, Marketing=70000.0, HR=61000.0}
Multi-level grouping: the downstream collector can itself be another groupingBy. You can create Map<String, Map<String, Long>> structures — department then seniority level, for example — with no imperative loops at all.

partitioningBy — a boolean split

Collectors.partitioningBy(predicate) is a specialised form of groupingBy where the key is always a boolean. The result is a Map<Boolean, List<T>> with exactly two entries: true and false.

Map<Boolean, List<Employee>> highEarners = employees.stream() .collect(Collectors.partitioningBy( e -> e.salary() >= 80_000 )); System.out.println("High earners: " + highEarners.get(true) .stream() .map(Employee::name) .toList()); // High earners: [Alice, Bob] System.out.println("Others: " + highEarners.get(false) .stream() .map(Employee::name) .toList()); // Others: [Carol, Dave, Eve]

Like groupingBy, it accepts a downstream collector as a second argument:

Map<Boolean, Long> counts = employees.stream() .collect(Collectors.partitioningBy( e -> e.salary() >= 80_000, Collectors.counting() )); // {false=3, true=2}
When to choose partitioningBy over groupingBy: whenever your classifier is inherently binary — active/inactive, pass/fail, above-threshold/below-threshold. partitioningBy makes the intent crystal-clear and always guarantees both keys exist in the result map (even if one bucket is empty), whereas groupingBy only includes keys that actually appear in the data.

joining — assembling strings from a stream

Collectors.joining() concatenates a stream of String elements into a single string. Three overloads are available:

  • joining() — plain concatenation, no separator.
  • joining(delimiter) — elements separated by delimiter.
  • joining(delimiter, prefix, suffix) — wraps the result too.
List<String> tags = List.of("java", "streams", "collectors", "functional"); // plain String plain = tags.stream().collect(Collectors.joining()); System.out.println(plain); // javastreamscollectorsfunctional // comma-separated String csv = tags.stream().collect(Collectors.joining(", ")); System.out.println(csv); // java, streams, collectors, functional // SQL-style IN clause String inClause = tags.stream() .collect(Collectors.joining("', '", "('", "')")); System.out.println(inClause); // ('java', 'streams', 'collectors', 'functional')
joining only works on streams of String. If your stream holds objects you must call .map(Object::toString) (or a more specific mapper) before collecting. Forgetting this causes a compile-time type error.

Composing collectors — a realistic example

Real code often chains all of these together. Suppose you need a report that shows, per department, the comma-separated list of employee names:

Map<String, String> namesByDept = employees.stream() .collect(Collectors.groupingBy( Employee::department, Collectors.mapping( Employee::name, Collectors.joining(", ") ) )); namesByDept.forEach((dept, names) -> System.out.println(dept + " -> " + names)); // Engineering -> Alice, Bob // Marketing -> Carol, Dave // HR -> Eve

Here Collectors.mapping() is used as a downstream adapter: it first maps each Employee to its name (a String), then feeds those strings into joining. This three-level composition replaces what would otherwise be a nested loop with a map of lists, a second loop, and a StringBuilder.

Summary

The four collectors you learned in this lesson unlock the core of data-aggregation work in Java:

  • counting() — counts elements, most useful as a downstream collector.
  • groupingBy(classifier) — buckets elements by key; compose a downstream collector to aggregate each bucket.
  • partitioningBy(predicate) — binary split; always produces both keys; clearer intent than a boolean groupingBy.
  • joining(delimiter, prefix, suffix) — assembles string streams; requires a string-typed stream.

In the next lesson we will look at numeric streams — IntStream, LongStream, and DoubleStream — and the specialised numeric collectors that complement them.