Distributed Tracing: Context Across Services

Distributed tracing across multiple services is where observability truly shines. When a user's request flows through service A (API gateway), service B (order service), service C (payment service), and service D (database), a single trace ID must flow through all four so that a single trace in your backend shows the complete path. Without distributed tracing, each service sees a separate request; with it, you see the full request tree and can identify exactly which service or operation caused latency or failure.

Trace context propagation is the mechanism that carries the trace ID and other context across service boundaries. The W3C Trace Context standard (RFC 9110) defines HTTP headers for this: traceparent contains the trace ID and span ID, while tracestate carries vendor-specific data. OpenTelemetry handles propagation automatically for common libraries, but understanding the mechanism is essential for debugging and integration with non-instrumented services.

How Does Trace Context Propagate Across Services?

When service A calls service B, service A injects the current trace context into HTTP headers. Service B extracts those headers, learns it is part of an existing trace, and adds its own spans to the same trace. The result is a unified view of the request's journey.

# Service A: API Gateway
from flask import Flask
from opentelemetry import trace
from opentelemetry.propagate import inject
import requests

app = Flask(__name__)
tracer = trace.get_tracer(__name__)

@app.route('/api/orders', methods=['POST'])
def create_order():
    with tracer.start_as_current_span("create_order") as span:
        order_data = request.json
        span.set_attribute("user_id", order_data['user_id'])
        
        # Call service B (order-service) with trace context
        headers = {}
        inject(headers)  # Adds traceparent, tracestate
        
        response = requests.post(
            'http://order-service/orders',
            json=order_data,
            headers=headers
        )
        span.set_attribute("response_status", response.status_code)
        return response.json()

# Service B: Order Service
from opentelemetry.propagate import extract

@app.route('/orders', methods=['POST'])
def save_order():
    # Extract trace context from incoming headers
    ctx = extract(request.headers)
    with tracer.start_as_current_span("save_order", context=ctx) as span:
        order_data = request.json
        span.set_attribute("order_id", order_data['id'])
        
        # Call service C (payment-service) with trace context
        headers = {}
        inject(headers)
        
        payment_response = requests.post(
            'http://payment-service/charge',
            json={'user_id': order_data['user_id'], 'amount': order_data['amount']},
            headers=headers
        )
        span.set_attribute("payment_status", payment_response.status_code)
        return {'order_id': order_data['id'], 'status': 'saved'}

Without explicit context extraction, service B would start a new trace and the services would be disconnected. The extract() function reads the headers and links service B's spans to service A's trace.

What Is the W3C Trace Context Standard?

The W3C Trace Context (RFC 9110) defines a standard format for propagating trace information across services. The two main headers are:

traceparent: Format is 00-traceID-spanID-traceflags. Example: 00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01
- 00: version (always 00 in current spec)
- 4bf92f3577b34da6a3ce929d0e0e4736: 32-character hex trace ID (128-bit)
- 00f067aa0ba902b7: 16-character hex span ID (64-bit)
- 01: trace flags (01 = sampled, 00 = not sampled)
tracestate: Vendor-specific key-value pairs. Example: tracestate: jaeger=00f067aa0ba902b7:001

OpenTelemetry automatically generates and propagates these headers when you use inject() and extract().

from opentelemetry.propagate import inject, extract

# Outgoing request: inject current context
headers = {}
inject(headers)
print(headers)
# Output: {'traceparent': '00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01'}

# Incoming request: extract parent context
from opentelemetry import trace
context = extract(request.headers)
with tracer.start_as_current_span("operation", context=context):
    # This span is linked to the parent trace
    pass

How Do You Trace Database Calls in a Distributed System?

Database calls do not use HTTP headers, so you must manually propagate context through your code or use instrumentation that captures database spans.

from opentelemetry import trace
import psycopg2

tracer = trace.get_tracer(__name__)

def query_database(sql, params):
    # Create a span for the database operation
    with tracer.start_as_current_span("db_query") as span:
        span.set_attribute("db.system", "postgresql")
        span.set_attribute("db.statement", sql)
        span.set_attribute("db.user", "app_user")
        
        # Execute the query
        conn = psycopg2.connect("dbname=myapp user=app_user")
        cursor = conn.cursor()
        
        try:
            cursor.execute(sql, params)
            rows = cursor.fetchall()
            span.set_attribute("db.result_set_size", len(rows))
            return rows
        except Exception as e:
            span.record_exception(e)
            span.set_status(trace.Status(trace.StatusCode.ERROR))
            raise
        finally:
            cursor.close()
            conn.close()

OpenTelemetry provides auto-instrumentation for popular databases (psycopg2, MySQL, SQLAlchemy). Install and enable it:

pip install opentelemetry-instrumentation-sqlalchemy opentelemetry-instrumentation-psycopg2

from opentelemetry.instrumentation.sqlalchemy import SQLAlchemyInstrumentor
from opentelemetry.instrumentation.psycopg2 import Psycopg2Instrumentor

SQLAlchemyInstrumentor().instrument()
Psycopg2Instrumentor().instrument()

Now database queries automatically create spans with timing and status.

How Do You Debug Multi-Service Traces?

In Jaeger (or similar backends), view a single trace that spans multiple services:

Find a trace by service name, operation name, duration, or error status
Click on the trace to view the waterfall
Each span shows: operation name, service name, start time, duration, attributes
Red spans indicate errors; gray spans indicate sampling

Typical view for a multi-service order flow:

Trace ID: abc123def456
  Timeline view
  └─ POST /orders (API Gateway, 250ms total)
     ├─ create_order (5ms)
     └─ POST http://order-service/orders (245ms)
        └─ save_order (Order Service, 240ms)
           ├─ validate_order (10ms)
           ├─ save_to_db (80ms)
           │  └─ INSERT orders table (75ms)
           └─ POST http://payment-service/charge (150ms)
              └─ charge_card (Payment Service, 145ms)
                 ├─ lookup_user (5ms)
                 └─ call_stripe_api (140ms)

From this view, you immediately see that call_stripe_api is the bottleneck (140ms out of 250ms total). You can click on that span to see its attributes (Stripe API version, charge ID, etc.) and any exceptions.

What Is Baggage and When Should You Use It?

Baggage is metadata that flows through a trace but is not itself a span. Common examples: user ID, feature flags, session ID, environment. Baggage is propagated automatically via headers, and every span in the trace can access it.

from opentelemetry.baggage import set_baggage, get_baggage

@app.route('/orders', methods=['POST'])
def create_order():
    request_data = request.json
    
    # Set baggage that will flow to all downstream services
    set_baggage("user_id", str(request_data['user_id']))
    set_baggage("customer_tier", request_data.get('tier', 'standard'))
    
    with tracer.start_as_current_span("create_order") as span:
        # Any downstream service can access baggage
        user_id = get_baggage("user_id")
        tier = get_baggage("customer_tier")
        
        span.set_attribute("user_id", user_id)
        span.set_attribute("customer_tier", tier)
        
        # Call downstream service; baggage is automatically propagated
        response = requests.post('http://payment-service/charge', json=...)
        return response.json()

Baggage is useful for request-scoped context, but use it sparingly: every baggage entry adds HTTP header size, and baggage is not filtered by sampling. If you only need context in one service, use a span attribute instead.

Key Takeaways

Trace context (trace ID, span ID) must flow across service boundaries.
OpenTelemetry's inject() and extract() propagate W3C Trace Context headers.
Auto-instrumentation (requests, Flask, databases) handles propagation automatically.
Database spans require manual instrumentation or auto-instrumentation libraries.
Baggage carries metadata (user ID, feature flags) through a trace.

Frequently Asked Questions

What happens if a service does not propagate trace context?

The trace breaks: child spans are orphaned and appear as separate traces. Always verify that all services extract the incoming trace context.

How do I correlate logs with traces?

Emit the trace ID and span ID as fields in every log. Log aggregation systems can then link logs to traces. Example: logger.info("Processing", extra={"trace_id": trace.get_current_span().get_span_context().trace_id})

What is the performance overhead of distributed tracing?

Negligible if using sampling (e.g., sample 1% of requests). Span creation is microseconds; the main cost is network traffic to send spans to the backend. Use batching (BatchSpanProcessor) to amortize this cost.

Can I disable tracing for specific services or operations?

Yes. Use a sampler that returns False for certain conditions: TraceIDRatioBased(rate=0.01) samples 1% of traces. Custom samplers can check operation name or service name.

How do I extract trace ID from a span to log it?

from opentelemetry import trace
span = trace.get_current_span()
context = span.get_span_context()
trace_id = context.trace_id
span_id = context.span_id

How Does Trace Context Propagate Across Services?​

What Is the W3C Trace Context Standard?​

How Do You Trace Database Calls in a Distributed System?​

How Do You Debug Multi-Service Traces?​

What Is Baggage and When Should You Use It?​

Key Takeaways​

Frequently Asked Questions​

What happens if a service does not propagate trace context?​

How do I correlate logs with traces?​

What is the performance overhead of distributed tracing?​

Can I disable tracing for specific services or operations?​

How do I extract trace ID from a span to log it?​

Further Reading​