Distributed Tracing & OpenTelemetry

Project: Trace a Microservices Request

18 min Lesson 10 of 28

Project: Trace a Microservices Request

This final lesson synthesizes everything in the tutorial into a complete, production-representative workflow. You will instrument a three-service e-commerce flow — API gateway, order service, and inventory service — with OpenTelemetry, route spans through the OTel Collector, store them in Jaeger, and work through a realistic latency investigation using the trace waterfall. Every step here reflects what engineering teams at Stripe, DoorDash, and Shopify actually do when they trace production traffic.

The Demo Architecture

The scenario: a POST /orders request enters an api-gateway (Node.js), which calls an order-service (Python/FastAPI), which calls an inventory-service (Go) to reserve stock. The inventory service also writes to PostgreSQL. All three services emit OTLP spans to a shared OTel Collector, which exports to Jaeger. This is a minimal but realistic topology — the same instrumentation patterns scale to 200-service meshes.

Project architecture: three services, OTel Collector, Jaeger Client HTTP POST api-gateway Node.js OTel SDK order-service Python OTel SDK inventory-svc Go OTel SDK + Postgres OTLP/gRPC OTLP/gRPC OTLP/gRPC OTel Collector receive → batch → export Jaeger Jaeger UI + Storage
Project topology: three instrumented services emit OTLP spans to a shared Collector, which exports to Jaeger for visualization and analysis.

Step 1 — Spin Up the Infrastructure

Use Docker Compose to bring up Jaeger and the OTel Collector side by side. In production you would deploy these as Kubernetes workloads (Deployment + Service), but Compose is ideal for local iteration. Save this as docker-compose.yml in your project root.

version: "3.9" services: jaeger: image: jaegertracing/all-in-one:1.55 ports: - "16686:16686" # Jaeger UI - "4317:4317" # OTLP gRPC (if exposing directly — avoid in prod) environment: - COLLECTOR_OTLP_ENABLED=true otel-collector: image: otel/opentelemetry-collector-contrib:0.95.0 volumes: - ./otel-collector-config.yml:/etc/otelcol/config.yaml ports: - "4318:4318" # OTLP HTTP (services send spans here) - "55679:55679" # zPages health/debug UI depends_on: - jaeger

Now create otel-collector-config.yml. This mirrors a minimal production Collector config — a batch processor to reduce Jaeger write pressure, and a memory limiter to prevent OOM under burst traffic.

receivers: otlp: protocols: grpc: endpoint: "0.0.0.0:4317" http: endpoint: "0.0.0.0:4318" processors: batch: timeout: 5s send_batch_size: 512 memory_limiter: check_interval: 1s limit_mib: 400 spike_limit_mib: 100 exporters: otlp/jaeger: endpoint: "jaeger:4317" tls: insecure: true logging: loglevel: warn # set to debug briefly when troubleshooting dropped spans service: pipelines: traces: receivers: [otlp] processors: [memory_limiter, batch] exporters: [otlp/jaeger, logging]
Key idea — processor ordering matters: Always put memory_limiter before batch in the pipeline. If memory spikes, the limiter drops spans before they accumulate in the batcher's buffer. Reversing the order means the batcher fills up under burst load and then the limiter has a large buffer to discard all at once — causing latency spikes in the Collector itself.

Step 2 — Instrument the Order Service (Python)

The order service is a FastAPI application. Install the OTel SDK packages and configure auto-instrumentation for HTTP and SQLAlchemy. Use zero-code instrumentation for the framework layers, and manual spans for your own business logic — this is the production pattern: let auto-instrumentation cover the boring boilerplate, reserve manual spans for the operations that matter to your domain.

# requirements.txt additions opentelemetry-api==1.23.0 opentelemetry-sdk==1.23.0 opentelemetry-instrumentation-fastapi==0.44b0 opentelemetry-instrumentation-httpx==0.44b0 opentelemetry-instrumentation-sqlalchemy==0.44b0 opentelemetry-exporter-otlp-proto-grpc==1.23.0 # --- order_service/tracing.py --- from opentelemetry import trace from opentelemetry.sdk.trace import TracerProvider from opentelemetry.sdk.trace.export import BatchSpanProcessor from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter from opentelemetry.sdk.resources import Resource, SERVICE_NAME import os def configure_tracing(): resource = Resource.create({ SERVICE_NAME: "order-service", "service.version": os.getenv("APP_VERSION", "unknown"), "deployment.environment": os.getenv("ENV", "local"), }) provider = TracerProvider(resource=resource) exporter = OTLPSpanExporter( endpoint=os.getenv("OTEL_EXPORTER_OTLP_ENDPOINT", "http://otel-collector:4317"), insecure=True, ) provider.add_span_processor(BatchSpanProcessor( exporter, max_queue_size=2048, max_export_batch_size=512, schedule_delay_millis=5000, )) trace.set_tracer_provider(provider) # --- order_service/main.py --- from fastapi import FastAPI, HTTPException from opentelemetry import trace from opentelemetry.instrumentation.fastapi import FastAPIInstrumentor from opentelemetry.instrumentation.httpx import HTTPXClientInstrumentor from .tracing import configure_tracing configure_tracing() app = FastAPI() FastAPIInstrumentor.instrument_app(app) HTTPXClientInstrumentor().instrument() tracer = trace.get_tracer("order-service") @app.post("/orders") async def create_order(payload: dict): with tracer.start_as_current_span("validate-order") as span: span.set_attribute("order.item_count", len(payload.get("items", []))) span.set_attribute("order.customer_tier", payload.get("tier", "standard")) if not payload.get("items"): span.set_status(trace.StatusCode.ERROR, "empty order") span.record_exception(ValueError("Order has no items")) raise HTTPException(400, "Order must contain at least one item") # The httpx call is auto-instrumented — W3C traceparent injected automatically async with httpx.AsyncClient() as client: resp = await client.post( "http://inventory-service:8001/reserve", json={"items": payload["items"]}, timeout=5.0, ) resp.raise_for_status() return {"order_id": "ord_" + str(uuid.uuid4())[:8], "status": "confirmed"}
Pro practice — resource attributes are your identity in Jaeger: Always set service.name, service.version, and deployment.environment in the Resource. Jaeger uses service.name as the primary index key for the service list. Without service.version, you cannot distinguish which deployment introduced a latency regression when you have three canary pods running different builds simultaneously.

Step 3 — Instrument the Inventory Service (Go)

The Go service uses the OTel Go SDK. Wrap the database calls with manual spans — the Go SQL driver does not have stable auto-instrumentation in all versions, so explicit span creation is the reliable approach. Note how span attributes follow the OpenTelemetry semantic conventions for database operations: db.system, db.name, db.statement.

// inventory-service/tracing.go package main import ( "context" "go.opentelemetry.io/otel" "go.opentelemetry.io/otel/attribute" "go.opentelemetry.io/otel/codes" "go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc" sdktrace "go.opentelemetry.io/otel/sdk/trace" "go.opentelemetry.io/otel/sdk/resource" semconv "go.opentelemetry.io/otel/semconv/v1.21.0" ) func InitTracer(ctx context.Context) (*sdktrace.TracerProvider, error) { exporter, err := otlptracegrpc.New(ctx, otlptracegrpc.WithEndpoint("otel-collector:4317"), otlptracegrpc.WithInsecure(), ) if err != nil { return nil, err } res, _ := resource.New(ctx, resource.WithAttributes( semconv.ServiceName("inventory-service"), semconv.ServiceVersion("1.4.2"), attribute.String("deployment.environment", "local"), ), ) tp := sdktrace.NewTracerProvider( sdktrace.WithBatcher(exporter), sdktrace.WithResource(res), // 10% head-based sampling in prod; 100% locally sdktrace.WithSampler(sdktrace.TraceIDRatioBased(1.0)), ) otel.SetTracerProvider(tp) return tp, nil } // inventory-service/handler.go func reserveStock(ctx context.Context, items []Item) error { tracer := otel.Tracer("inventory-service") ctx, span := tracer.Start(ctx, "db.reserve-stock") defer span.End() span.SetAttributes( attribute.String("db.system", "postgresql"), attribute.String("db.name", "inventory"), attribute.Int("db.items_count", len(items)), // NOTE: omit db.statement in prod if it contains PII or secrets attribute.String("db.statement", "UPDATE stock SET reserved = reserved + $1 WHERE sku = $2"), ) for _, item := range items { if err := db.ExecContext(ctx, "UPDATE stock SET reserved = reserved + $1 WHERE sku = $2", item.Qty, item.SKU, ); err != nil { span.SetStatus(codes.Error, err.Error()) span.RecordError(err) return err } } span.SetStatus(codes.Ok, "") return nil }

Step 4 — Generate Load and Find a Trace in Jaeger

Start the stack, send a test request, and navigate to the Jaeger UI at http://localhost:16686.

# Start the Collector and Jaeger docker compose up -d # Start services (adjust for your local setup) cd order-service && uvicorn main:app --port 8000 & cd inventory-service && go run . & # Send a request that creates an order curl -X POST http://localhost:8000/orders \ -H "Content-Type: application/json" \ -d '{"items":[{"sku":"SKU-001","qty":2},{"sku":"SKU-007","qty":1}],"tier":"gold"}' # Expected response: # {"order_id":"ord_a3f9bc12","status":"confirmed"} # Check the Collector is receiving spans curl http://localhost:55679/debug/tracez | grep -A5 "order-service" # In Jaeger UI: # 1. Select "order-service" from the service dropdown # 2. Click "Find Traces" # 3. Click the trace that shows POST /orders # 4. Observe the waterfall: api-gateway root → validate-order → httpx POST → db.reserve-stock

Step 5 — Introduce a Latency Bug and Debug with Traces

Now simulate the scenario every on-call engineer eventually faces: a latency regression in a downstream service that is invisible from the top-level metrics dashboard. Inject an artificial delay into the inventory service — this stands in for a slow SQL index scan, a cold cache, or an overloaded third-party API.

// Inject this into reserveStock() to simulate a slow query // In real investigations you would see this as a wide span in the waterfall // Simulate a slow index scan on a high-cardinality column time.Sleep(350 * time.Millisecond) span.AddEvent("slow-index-scan-detected", trace.WithAttributes( attribute.String("db.index", "stock_sku_idx"), attribute.Int("db.rows_scanned", 95000), ))

After injecting this delay, send another request. Open Jaeger, search for traces from order-service with duration greater than 300ms (use the "Min Duration" filter). Click the slow trace. You will see exactly what the instrumentation reveals:

  • The root span POST /checkout in api-gateway now shows ~370ms total.
  • The validate-order span completes in ~2ms — not the culprit.
  • The httpx POST /reserve span is ~365ms — the call to inventory is slow.
  • Inside that, db.reserve-stock is ~355ms, and its event log shows slow-index-scan-detected with db.rows_scanned: 95000.

The root cause is pinpointed in under 60 seconds. No log grepping. No dashboard hopping. The span attribute db.rows_scanned: 95000 tells you exactly what to fix: add a covering index on stock(sku). This is the full value proposition of distributed tracing realized in practice.

Trace waterfall showing the injected latency regression 0ms 100ms 200ms 300ms 370ms api-gateway POST /orders 370ms (root) validate-order 2ms ✓ order→inventory httpx POST /reserve 365ms db.reserve-stock db.reserve-stock 355ms ← SLOW (rows_scanned: 95000) slow-index-scan root span fast span slow (cross-service) root cause span Fix: CREATE INDEX CONCURRENTLY ON stock(sku); db.rows_scanned drops from 95,000 → 1 · latency: 355ms → 3ms
The trace waterfall after injecting the latency bug: the slow-index-scan event on the db.reserve-stock span immediately identifies the root cause and the fix.

Step 6 — Connect Traces to Metrics and Logs (Exemplars)

In production, the full observability loop closes when you can jump from a Prometheus alert to a specific trace. OpenTelemetry supports exemplars — trace ID references embedded inside metric data points. When Prometheus scrapes your application, high-latency histogram buckets can carry a trace_id that links directly to the trace that produced that data point.

In Python with the OTel Prometheus exporter, exemplars are emitted automatically when the current span is sampled. In Grafana, enable exemplars on a histogram panel to see orange dots on slow buckets — click any dot to jump directly to the Jaeger trace. This eliminates the manual step of copying a trace ID from a log entry into the Jaeger search box. It is the canonical "metrics-to-traces" correlation workflow used in Grafana Cloud, Datadog, and Honeycomb.

The trace_id should also be injected into every structured log line emitted during a request. In Python with structlog or the OTel logging bridge, this happens automatically once you configure the OTel log SDK. The result: from a Loki log query you can extract the trace_id field and follow it directly to the Jaeger UI — closing the "logs-to-traces" correlation path.

Pro practice — the three-way correlation: At Uber and Netflix, the gold standard is that any signal — an alert, a log line, or a high-latency histogram bucket — carries a trace_id that is clickable. Setting this up requires: (1) OTel trace context propagated into logs via the logging bridge; (2) exemplars enabled on histograms; (3) Grafana configured with a Jaeger datasource linked to the Prometheus datasource. Once wired, an on-call engineer can go from PagerDuty alert → Grafana dashboard → high-latency exemplar → Jaeger trace → root cause span → span event in under 3 minutes. This is the MTTD target for a mature SRE organization.

Step 7 — Tail Sampling for Production Volume

Your local setup samples 100% of traces. In production at meaningful throughput, you cannot afford to store every span. The right strategy — covered in depth in lesson 7 — is tail-based sampling in the Collector: collect all spans, make the sampling decision after the trace is complete, and keep 100% of error traces and slow traces while sampling healthy fast ones at 1-10%.

Add this to your Collector config to enable tail sampling in production:

# Add to otel-collector-config.yml processors section: processors: tail_sampling: decision_wait: 30s # wait up to 30s for all spans to arrive num_traces: 100000 # hold up to 100k in-flight traces in memory expected_new_traces_per_sec: 1000 policies: - name: keep-errors type: status_code status_code: {status_codes: [ERROR]} - name: keep-slow type: latency latency: {threshold_ms: 500} - name: probabilistic-baseline type: probabilistic probabilistic: {sampling_percentage: 5} # keep 5% of healthy fast traces # Then reference it in the pipeline: service: pipelines: traces: receivers: [otlp] processors: [memory_limiter, tail_sampling, batch] exporters: [otlp/jaeger, logging]
Production pitfall — tail sampling memory: The num_traces setting determines how many open traces the Collector holds in RAM while waiting for all spans to arrive. At 1,000 requests/second with a 30-second window, you need capacity for 30,000 in-flight traces. Each trace holds its spans in memory — at 10 spans per trace averaging 1KB each, that is 300MB of working RAM just for the sampling buffer. Size your Collector pod accordingly, and set limit_mib on the memory_limiter conservatively. The Collector will drop spans gracefully before OOMing, but the sampling decision window will shorten under pressure — monitor otelcol_processor_tail_sampling_sampling_decision_timer_latency in Prometheus.

Project Complete: What You Built

You now have a fully operational distributed tracing system. Three services emit spans in OTLP format, the Collector fans them in, batches them efficiently, applies tail-based sampling, and exports to Jaeger. The trace waterfall showed you a 355ms database bottleneck in under 60 seconds — something that would have taken 30-45 minutes with log correlation. Exemplars close the loop from metrics to traces, and the logging bridge closes it from logs to traces. This is unified observability: the same event visible from three angles, each navigable to the others with a single click.

The patterns here — OTLP over gRPC, BatchSpanProcessor, semantic conventions for db.* and http.* attributes, tail sampling in the Collector, exemplars in Prometheus — are exactly what you would wire up at any big-tech employer. OpenTelemetry is now the default choice at Google, AWS, Microsoft, and every major cloud-native organization. The vendor-neutral SDK means you can swap Jaeger for Tempo, or Tempo for Honeycomb, by changing two lines in the Collector config — your application code is untouched. That portability is the lasting value of building on the OTel standard.

Tutorial complete: You have covered the full observability arc — from the why of tracing (lesson 1) through spans and context propagation (lesson 2), the OpenTelemetry standard (lesson 3), SDK instrumentation (lesson 4), the Collector (lesson 5), Jaeger and Tempo backends (lesson 6), sampling strategies (lesson 7), production debugging workflows (lesson 8), unified observability with the three pillars (lesson 9), and this end-to-end project (lesson 10). You are equipped to design, implement, and operate production-grade distributed tracing at scale.