Garbage Collection Fundamentals
Garbage Collection Fundamentals
Java's garbage collector (GC) is one of the defining features of the JVM. It automatically reclaims memory occupied by objects that are no longer reachable by your program, eliminating entire classes of bugs — dangling pointers, double-frees — that plague C and C++ developers. Understanding how the GC makes its decisions is essential for writing high-throughput, low-latency Java applications.
Reachability: What the GC Actually Tracks
The GC does not track every object independently. Instead it works backward from a set of definite starting points called GC roots and traverses every reference those roots can reach, directly or transitively. Any object reachable from a GC root is live and must not be collected. Any object that cannot be reached from any GC root is unreachable and is eligible for collection.
The JVM recognises several kinds of GC roots:
- Local variables and parameters on the call stack of any active thread.
- Static fields of loaded classes — they live as long as the class is loaded.
- Active threads themselves (a
Threadobject is always a root while running). - JNI references — objects held by native code via the JNI.
- Monitor locks — objects currently held by a
synchronizedblock.
Map, an unclosed resource) keeps an entire object graph alive.
Consider the following example:
Generational Garbage Collection
Tracing every object on every GC cycle is prohibitively expensive for large heaps. The JVM exploits a powerful empirical observation called the generational hypothesis:
Most objects die young.
Profiling of real Java applications consistently shows that the vast majority of objects — temporary results, request-scoped data, iterator objects — become unreachable within milliseconds of allocation. A tiny minority of objects (caches, connection pools, class-level state) survive for the lifetime of the application. Generational GC exploits this by dividing the heap into regions and collecting the young region far more frequently than the old region.
The Heap Layout
The JVM heap is split into two main regions:
- Young Generation (Eden + Survivor spaces S0 and S1): All new objects are allocated here. The Young GC (also called minor GC) runs frequently and collects only this region — because most objects here are already dead, it is very fast.
- Old Generation (Tenured space): Objects that survive enough minor GCs are promoted to the Old Generation. The Old GC (also called major or full GC) runs infrequently and is far more expensive.
The Young Generation in Detail
The Young Generation uses a copy-collection (semi-space) algorithm:
- Allocation — new objects go into Eden using a simple bump-pointer allocator (extremely fast).
- Minor GC triggers — when Eden fills up, a minor GC starts. All threads are stopped (a stop-the-world pause).
- Live tracing — the GC traces from GC roots and identifies live objects in Eden and the currently active Survivor space.
- Copy — live objects are copied into the other (empty) Survivor space. Dead objects are simply abandoned — there is no per-object deallocation.
- Promotion — objects that have survived a configurable number of GC cycles (the tenuring threshold, default 15 for most collectors) are promoted to the Old Generation.
- Swap — the Survivor spaces swap roles; the just-emptied space becomes the new active one.
Promotion and the Old Generation
Objects graduate to the Old Generation in two ways:
- They exceed the tenuring threshold.
- The Survivor space is too full to accommodate them (premature promotion).
Premature promotion is a common performance problem: if your application creates large numbers of medium-lived objects, Survivor spaces fill quickly, objects get promoted early, and the Old Generation fills faster than necessary — triggering expensive major GC cycles.
Stop-the-World Pauses
Most GC work requires a stop-the-world (STW) pause: all application threads are halted while the GC runs, then resumed. Minor GC STW pauses are typically a few milliseconds. Major GC STW pauses can be hundreds of milliseconds or longer for large heaps with older collectors.
Reference Strength and Collectibility
Java has four reference strengths that influence GC eligibility:
- Strong reference — a normal Java reference; the object is never collected while this reference is reachable.
SoftReference<T>— collected only when the JVM is about to throwOutOfMemoryError. Useful for memory-sensitive caches.WeakReference<T>— collected at the next GC cycle when no strong or soft references exist. Used byWeakHashMap.PhantomReference<T>— enqueued after finalization; used for post-mortem cleanup (preferCleanerin modern Java).
Practical Takeaways
Understanding generational GC and reachability lets you reason about performance without guessing:
- Objects that die before the next minor GC are essentially "free" — they never touch the Old Generation.
- Long-lived objects (singletons, caches) should be truly long-lived — avoid patterns that repeatedly promote and then discard objects from the Old Generation.
- Static collections are GC roots; anything added to them lives until the collection is cleared or the entry removed.
- Use
WeakReferenceorSoftReferencefor caches whose entries should not prevent GC. - Monitor
-Xlog:gc*output to see actual minor vs. major GC frequency and pause times before tuning flags.
The next lesson extends this foundation to the specific GC algorithms available in the JVM — Serial, Parallel, G1, and ZGC — and shows how to choose and tune them for different workload profiles.