The Collections Framework

HashSet & Set Semantics

15 min Lesson 4 of 14

HashSet & Set Semantics

A Set is a collection that never contains duplicate elements. This single guarantee shapes every decision about how a Set works internally and how you use it. HashSet is the most common Set implementation in Java — it delivers constant-time add, contains, and remove by using a hash table under the hood.

The Set Contract

The java.util.Set interface extends Collection and adds one rule: no two equal elements may coexist. "Equal" means whatever equals() returns true for. Because of this, add() returns false instead of throwing when you try to insert a duplicate — the element is simply ignored.

import java.util.HashSet; import java.util.Set; Set<String> tags = new HashSet<>(); tags.add("java"); tags.add("backend"); tags.add("java"); // duplicate — silently ignored System.out.println(tags.size()); // 2 System.out.println(tags.contains("java")); // true
Iteration order is not guaranteed. HashSet does not preserve insertion order or any sorted order. If you need predictable order, use LinkedHashSet (insertion order) or TreeSet (sorted order) — both covered in upcoming lessons.

How HashSet Works Internally

Under the hood, HashSet<E> is backed by a HashMap<E, Object>. When you call add(element), Java does two things:

  1. Calls element.hashCode() to find the right "bucket" in the hash table.
  2. Calls element.equals(other) against any existing element in that bucket to check for a duplicate.

This two-step process is why both hashCode and equals must be correct for your objects — neither alone is enough.

The equals/hashCode Contract

Java's contract (documented in Object) states:

  • If a.equals(b) is true, then a.hashCode() == b.hashCode() must also be true.
  • The reverse is not required: two objects may share a hash code without being equal (that is a hash collision, and it is normal).
Breaking the contract silently corrupts a HashSet. If two logically equal objects return different hash codes, they land in different buckets. The Set never compares them with equals, so both are stored — you end up with "duplicates" the Set cannot see.

Here is a class that demonstrates a correct implementation:

import java.util.Objects; public final class Point { private final int x; private final int y; public Point(int x, int y) { this.x = x; this.y = y; } @Override public boolean equals(Object o) { if (this == o) return true; if (!(o instanceof Point other)) return false; return x == other.x && y == other.y; } @Override public int hashCode() { return Objects.hash(x, y); // consistent with equals } }

With this class, a HashSet<Point> correctly treats two Point(1, 2) instances as identical:

Set<Point> points = new HashSet<>(); points.add(new Point(1, 2)); points.add(new Point(1, 2)); // same value, different object System.out.println(points.size()); // 1 — correctly de-duplicated

What Happens With the Default Object equals/hashCode

Object.equals compares references (same memory address), and Object.hashCode is based on the object's identity. If you do not override them, every instance is considered unique regardless of its field values:

// Point WITHOUT overriding equals/hashCode Set<Point> broken = new HashSet<>(); broken.add(new Point(1, 2)); broken.add(new Point(1, 2)); // different object reference System.out.println(broken.size()); // 2 — NOT de-duplicated!

Common Set Operations

Because Set models mathematical sets, the standard bulk operations map directly to set theory:

Set<Integer> a = new HashSet<>(Set.of(1, 2, 3, 4)); Set<Integer> b = new HashSet<>(Set.of(3, 4, 5, 6)); // Union (a ∪ b) Set<Integer> union = new HashSet<>(a); union.addAll(b); // {1, 2, 3, 4, 5, 6} // Intersection (a ∩ b) Set<Integer> intersection = new HashSet<>(a); intersection.retainAll(b); // {3, 4} // Difference (a − b) Set<Integer> difference = new HashSet<>(a); difference.removeAll(b); // {1, 2} System.out.println(union); // order not guaranteed System.out.println(intersection); System.out.println(difference);
Prefer Set.of(...) for small immutable sets. Since Java 9 you can write Set.of(1, 2, 3) to get a compact, unmodifiable set. If you need a mutable set pre-populated, pass it to a HashSet constructor as shown above.

Choosing the Right Set Implementation

  • HashSet — O(1) average for add/contains/remove; no order guarantee. Best for most use cases.
  • LinkedHashSet — same performance as HashSet but preserves insertion order.
  • TreeSet — O(log n) operations; elements are always sorted. Covered in the next lesson.

Summary

A Set enforces uniqueness by calling equals and hashCode on every element. HashSet implements this with a hash table, giving O(1) average-case performance. The critical rule: if you store custom objects in any Set (or as keys in any Map), you must override both equals and hashCode consistently — breaking the contract produces silent data bugs that are very hard to diagnose.