sorted, distinct, limit & skip
sorted, distinct, limit & skip
You have already seen stateless intermediate operations — filter and map process each element independently without looking at any other element. This lesson focuses on the four stateful intermediate operations and introduces the concept of short-circuiting, both of which matter for performance and correctness.
Stateful vs Stateless operations
A stateless operation needs only the current element. A stateful operation must accumulate or inspect multiple elements before it can produce output. The Streams API gives you four stateful intermediates:
sorted()— order all elementsdistinct()— remove duplicateslimit(n)— keep only the first n elementsskip(n)— discard the first n elements
limit / skip as early as possible so fewer elements reach the heavy operations downstream.
sorted
sorted() with no arguments sorts by natural order (elements must implement Comparable). Pass a Comparator to sort by any key:
limit before sorted.
distinct
distinct() removes duplicate elements using equals and hashCode. It is order-preserving for sequential streams (first occurrence wins):
This is especially useful when flattening collections where the same value can appear multiple times across different sub-lists.
equals / hashCode implementation, otherwise two logically identical objects will both pass through.
limit and skip — short-circuit operations
limit(n) keeps at most the first n elements and then stops the pipeline. skip(n) discards the first n elements and passes the rest downstream. Together they enable pagination of a stream:
The word short-circuit means the pipeline does not need to process every source element. Once limit has emitted its quota it signals the source to stop. This is the same concept as && stopping early in boolean expressions:
findFirst(), findAny(), anyMatch(), noneMatch(), and allMatch() can all stop the pipeline early. You will see them in a later lesson on Optional with Streams.
Combining all four in a real pipeline
A realistic scenario: from a list of log lines, find the top-5 unique error messages sorted alphabetically, skipping the first one (for some pagination use-case):
Performance tips
- Put
filterbeforesortedanddistinctto reduce the number of elements the stateful operations have to buffer. - Put
limitas early as the logic allows — every element cut before a heavy operation saves work. - Avoid
sortedon large parallel streams unless truly necessary; it introduces a merge step that can negate the parallelism benefit.
Summary
sorted and distinct are stateful — they must see all (or many) elements before producing output. limit and skip are short-circuit — they stop or skip early, making infinite streams practical. Placing expensive stateful operations late and short-circuit operations early is a simple rule that keeps pipelines efficient.