Service Discovery, Config & Gateway

Client-Side Load Balancing

18 min Lesson 3 of 12

Client-Side Load Balancing

In a traditional monolith you connect to a single database host or a single downstream service. In a microservices environment the same logical service runs as multiple identical instances — three pods of order-service, five pods of inventory-service, and so on. Something must decide which instance gets each request. That decision can happen in two places: at a dedicated load-balancer process sitting in between (server-side load balancing), or inside the calling service itself (client-side load balancing). This lesson is about the latter.

Server-Side vs Client-Side Load Balancing

Server-side load balancing means every outbound call goes to a fixed virtual IP or DNS name. An external device (an HAProxy, an AWS ALB, an Nginx upstream block) owns the instance list and the algorithm. The client knows only one address.

Client-side load balancing moves that responsibility into the caller. The caller:

Fetches the live instance list from the service registry (Eureka, in our stack).
Applies a selection algorithm locally to pick one instance.
Sends the HTTP request directly to that instance's host and port.

The advantage is fewer network hops and full control over the routing algorithm. The trade-off is that every client must be smart enough to maintain a local cache of instances, handle stale entries, and retry on failure.

Spring Cloud LoadBalancer is the official client-side load balancer since Spring Cloud 2020. Netflix Ribbon was its predecessor and is now in maintenance mode — do not start new projects with Ribbon.

Adding the Dependency

Client-side load balancing is bundled in the Spring Cloud starter for the service discovery client. If you are already using the Eureka client starter, you have everything you need:

<!-- pom.xml -- only Eureka client needed; LoadBalancer is included -->
<dependency>
    <groupId>org.springframework.cloud</groupId>
    <artifactId>spring-cloud-starter-netflix-eureka-client</artifactId>
</dependency>

The spring-cloud-starter-loadbalancer artifact is a transitive dependency of the Eureka client starter and is pulled in automatically. You do not need to add it separately unless you are using a different discovery mechanism.

How Spring Cloud LoadBalancer Works

When you build an HTTP client using WebClient (reactive) or RestClient / RestTemplate (servlet), you annotate the @Bean that builds it with @LoadBalanced. Spring Cloud wraps the resulting client with an interceptor. At call time that interceptor:

Detects a hostname that matches a registered service name (e.g. http://order-service/...).
Queries the local ServiceInstanceListSupplier cache, which is periodically refreshed from Eureka.
Runs the configured algorithm — round-robin by default — to select an instance.
Rewrites the URL to the real host:port of the chosen instance and proceeds.

Using a Load-Balanced WebClient

The reactive WebClient is the recommended HTTP client for Spring Boot 3 services. Configure a load-balanced builder bean once in a @Configuration class:

import org.springframework.cloud.client.loadbalancer.LoadBalanced;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.web.reactive.function.client.WebClient;

@Configuration
public class WebClientConfig {

    @Bean
    @LoadBalanced
    public WebClient.Builder loadBalancedWebClientBuilder() {
        return WebClient.builder();
    }
}

Inject the builder into any service bean that needs to call another service. Use the logical service name exactly as it is registered in Eureka — casing matters:

import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.stereotype.Service;
import org.springframework.web.reactive.function.client.WebClient;
import reactor.core.publisher.Mono;

@Service
public class OrderService {

    private final WebClient webClient;

    @Autowired
    public OrderService(WebClient.Builder builder) {
        // "inventory-service" is resolved at runtime via Eureka + LoadBalancer
        this.webClient = builder.baseUrl("http://inventory-service").build();
    }

    public Mono<Integer> getStock(Long productId) {
        return webClient.get()
                .uri("/api/inventory/{id}/stock", productId)
                .retrieve()
                .bodyToMono(Integer.class);
    }
}

Keep the base URL to the service name only. Do not hardcode a port in the URL (http://inventory-service:8082). LoadBalancer replaces the entire host portion; adding a port makes the substitution ambiguous and will break at runtime.

Using a Load-Balanced RestClient (Spring Boot 3.2+)

RestClient is the synchronous alternative introduced in Spring Framework 6.1. You get the same @LoadBalanced annotation pattern:

import org.springframework.cloud.client.loadbalancer.LoadBalanced;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.web.client.RestClient;

@Configuration
public class RestClientConfig {

    @Bean
    @LoadBalanced
    public RestClient.Builder loadBalancedRestClientBuilder() {
        return RestClient.builder();
    }
}

@Service
public class CatalogService {

    private final RestClient restClient;

    public CatalogService(RestClient.Builder builder) {
        this.restClient = builder.baseUrl("http://catalog-service").build();
    }

    public Product findById(Long id) {
        return restClient.get()
                .uri("/api/products/{id}", id)
                .retrieve()
                .body(Product.class);
    }
}

Changing the Load-Balancing Algorithm

The default algorithm is round-robin: requests are distributed evenly across all healthy instances in rotation. Spring Cloud LoadBalancer also ships a random algorithm. You switch per service by providing a custom configuration class and wiring it via @LoadBalancerClient:

import org.springframework.cloud.client.ServiceInstance;
import org.springframework.cloud.loadbalancer.core.RandomLoadBalancer;
import org.springframework.cloud.loadbalancer.core.ReactorLoadBalancer;
import org.springframework.cloud.loadbalancer.core.ServiceInstanceListSupplier;
import org.springframework.cloud.loadbalancer.support.LoadBalancerClientFactory;
import org.springframework.context.annotation.Bean;
import org.springframework.core.env.Environment;

// NOTE: do NOT place this class inside a @ComponentScan package root.
// Keep it outside the main application package so Spring only applies
// it to the specific service named in @LoadBalancerClient.
public class RandomLBConfig {

    @Bean
    public ReactorLoadBalancer<ServiceInstance> randomLoadBalancer(
            Environment env, LoadBalancerClientFactory factory) {
        String name = factory.getName(env);
        return new RandomLoadBalancer(
                factory.getLazyProvider(name, ServiceInstanceListSupplier.class), name);
    }
}

Then reference that config class on the application entry point or any @Configuration class:

@SpringBootApplication
@LoadBalancerClient(name = "inventory-service", configuration = RandomLBConfig.class)
public class OrderApplication {
    public static void main(String[] args) {
        SpringApplication.run(OrderApplication.class, args);
    }
}

All calls to inventory-service now use random selection; calls to every other service still use round-robin.

Instance Cache and Health Filtering

Spring Cloud LoadBalancer caches the instance list in a ServiceInstanceListSupplier pipeline. By default it refreshes every 35 seconds. You can tune this in application.yml:

spring:
  cloud:
    loadbalancer:
      cache:
        ttl: 30s          # how long to keep the cached instance list
        capacity: 256     # max number of service names to cache concurrently

The supplier pipeline also supports a health-check filter. Enable it to automatically exclude instances that are reporting DOWN in Eureka:

spring:
  cloud:
    loadbalancer:
      health-check:
        initial-delay: 0
        interval: 25s

Stale cache is a real failure mode. If an instance crashes and Eureka has not yet propagated the removal, your load balancer will still route some requests to the dead instance. Always combine client-side load balancing with a retry policy (covered in the Resilience lesson) so a single failed attempt is transparently retried against a different instance.

Security Consideration: mTLS Across Instances

Client-side load balancing means your service opens a direct TCP connection to each instance. In a production cluster, those connections should be encrypted and mutually authenticated — otherwise a compromised internal service could impersonate any instance. The common patterns are:

Service mesh (Istio / Linkerd): mTLS is handled transparently at the sidecar proxy layer; your application code sees plain HTTP.
Spring Boot TLS client config: Configure a truststore and keystore on the WebClient / RestClient for direct mTLS without a mesh.
JWT propagation: Pass the caller's bearer token downstream on every service-to-service call so downstream services can enforce authorization independently.

Summary

Client-side load balancing with Spring Cloud LoadBalancer gives each service a local, registry-backed router. You annotate a WebClient.Builder or RestClient.Builder bean with @LoadBalanced, address downstream services by their Eureka service name, and the framework resolves and rotates across real instances at call time. Round-robin is the default; random selection and custom algorithms are wired per service through @LoadBalancerClient. Tune the instance cache TTL, enable health filtering, and pair with a retry mechanism to handle the inevitable stale-entry and crashed-instance scenarios that arise in production.