HashSet & Set Semantics
HashSet & Set Semantics
A Set is a collection that never contains duplicate elements. This single guarantee shapes every decision about how a Set works internally and how you use it. HashSet is the most common Set implementation in Java — it delivers constant-time add, contains, and remove by using a hash table under the hood.
The Set Contract
The java.util.Set interface extends Collection and adds one rule: no two equal elements may coexist. "Equal" means whatever equals() returns true for. Because of this, add() returns false instead of throwing when you try to insert a duplicate — the element is simply ignored.
HashSet does not preserve insertion order or any sorted order. If you need predictable order, use LinkedHashSet (insertion order) or TreeSet (sorted order) — both covered in upcoming lessons.
How HashSet Works Internally
Under the hood, HashSet<E> is backed by a HashMap<E, Object>. When you call add(element), Java does two things:
- Calls
element.hashCode()to find the right "bucket" in the hash table. - Calls
element.equals(other)against any existing element in that bucket to check for a duplicate.
This two-step process is why both hashCode and equals must be correct for your objects — neither alone is enough.
The equals/hashCode Contract
Java's contract (documented in Object) states:
- If
a.equals(b)istrue, thena.hashCode() == b.hashCode()must also betrue. - The reverse is not required: two objects may share a hash code without being equal (that is a hash collision, and it is normal).
Set never compares them with equals, so both are stored — you end up with "duplicates" the Set cannot see.
Here is a class that demonstrates a correct implementation:
With this class, a HashSet<Point> correctly treats two Point(1, 2) instances as identical:
What Happens With the Default Object equals/hashCode
Object.equals compares references (same memory address), and Object.hashCode is based on the object's identity. If you do not override them, every instance is considered unique regardless of its field values:
Common Set Operations
Because Set models mathematical sets, the standard bulk operations map directly to set theory:
Set.of(...) for small immutable sets. Since Java 9 you can write Set.of(1, 2, 3) to get a compact, unmodifiable set. If you need a mutable set pre-populated, pass it to a HashSet constructor as shown above.
Choosing the Right Set Implementation
- HashSet — O(1) average for add/contains/remove; no order guarantee. Best for most use cases.
- LinkedHashSet — same performance as
HashSetbut preserves insertion order. - TreeSet — O(log n) operations; elements are always sorted. Covered in the next lesson.
Summary
A Set enforces uniqueness by calling equals and hashCode on every element. HashSet implements this with a hash table, giving O(1) average-case performance. The critical rule: if you store custom objects in any Set (or as keys in any Map), you must override both equals and hashCode consistently — breaking the contract produces silent data bugs that are very hard to diagnose.