list-timelineTracing

Group related LLM API calls into a single trace using a simple traceparent header.

Overview

FastRouter supports the W3C Trace Context standard via the traceparent header, enabling you to group multiple LLM API calls into a single trace with ordered spans.

This helps you understand the full lifecycle of complex workflows—across models, providers, and steps—in one place.

Common use cases:

  • Agentic workflows — multi-step chains with tool/function calls

  • Chat sessions — linking all turns in a conversation

  • Parallel requests — grouping concurrent calls

Tracing works with any HTTP client. No SDK or proprietary tooling is required.


How Tracing Works

When FastRouter receives a request with a traceparent header:

  • Extracts trace_id to group related requests

  • Records parent_id as the caller (application) span

  • Applies overrides from optional headers (if present)

  • Generates a new span_id for the gateway span

  • Captures latency, tokens, cost, and full request/response

  • Stores the span and groups it under the corresponding trace

All API calls sharing the same trace_id appear as a single trace with multiple spans.

Key rule: Reuse the same traceparent across all requests in a workflow.


traceparent Header Format

Field
Format
Description

version

00

Fixed (W3C standard)

trace_id

32 hex chars

Unique ID for the entire trace

parent_id

16 hex chars

Caller span ID (your application)

flags

01

Sampling flag. Note: Currently, all requests are shown on FastRouter.

Example:


Optional Headers

FastRouter supports additional headers to improve trace readability and control:

Header
Description
Example

x-span-name

Human-readable label for this span

hotel-search

x-span-id

Custom span ID (optional; auto-generated if omitted)

a1b2c3d4e5f6a7b8

x-trace-id

Overrides trace_id from traceparent

abc123...


Usage


What FastRouter does

  • Groups by trace_id

  • Uses your parent_id

  • Generates span_id

  • Names spans: POST /api/v1/chat/completions


Output

One trace → multiple spans

Each span includes: latency, tokens, cost, request, response.

Last updated