Reasoning Tokens

FastRouter can return Reasoning Tokens (also known as thinking tokens) for supported models.

Overview

FastRouter can return Reasoning Tokens (also known as thinking tokens) for supported models. These tokens represent the model’s internal reasoning process and can significantly improve output quality for complex tasks such as planning, math, tool use, and multi-step analysis.

Reasoning tokens are enabled by default for supported models.
The model decides whether to generate reasoning tokens unless explicitly controlled.
When returned, reasoning tokens appear in the reasoning field of each message.
You can limit, control, or exclude reasoning tokens using the reasoning parameter.

Supported Models

Reasoning Token Support

Reasoning tokens are currently supported by:

Gemini thinking models
Anthropic models (via reasoning.max_tokens)
OpenAI o-series models
Grok models

How Reasoning Tokens Appear in Responses

When enabled, reasoning tokens appear as structured blocks in the response:

{
  "type": "reasoning",
  "reasoning": {
    "text": "The model is considering multiple constraints before responding..."
  }
}

If excluded, the model still reasons internally—but the reasoning is not returned.

Controlling Reasoning Tokens

You can control reasoning behavior using the reasoning object in your request.

General Structure

{
  "model": "your-model",
  "messages": [],
  "reasoning": {
    "effort": "high",
    "max_tokens": 2000,
    "exclude": false,
    "enabled": true
  }
}

⚠️ Use either effort or max_tokens — not both.

Reasoning Effort Levels

Supported By

OpenAI o-series
Grok models

Effort Options

Effort

Token Allocation

high

~80% of max_tokens

medium

~50% of max_tokens

low

~20% of max_tokens

Example:

"reasoning": {
  "effort": "high"
}

Reasoning Max Tokens

Supported By

Gemini thinking models
Anthropic models

Example:

"reasoning": {
  "max_tokens": 2000
}

Anthropic-Specific Reasoning Behavior

When using Anthropic models:

Rules

reasoning.max_tokens
- Used directly
- Minimum: 1024 tokens
reasoning.effort
- Converted into a reasoning token budget
Reasoning tokens are:
- Minimum: 1024 tokens
- Maximum: 32,000 tokens

Budget Formula

budget_tokens = max(
  min(max_tokens × effort_ratio, 32000),
  1024
)

Where:

high → 0.8
medium → 0.5
low → 0.2

Important Constraint

max_tokens must be strictly greater than the reasoning budget, otherwise the model will not have enough tokens to produce a final answer.

Excluding Reasoning Tokens

You can instruct the model to reason internally without returning reasoning tokens.

"reasoning": {
  "exclude": true
}

The model still performs reasoning
Reasoning tokens are not included in the response
Works across all models

Token Usage & Billing

Reasoning tokens are counted as output tokens
They are billed the same way as regular output tokens
Enabling reasoning increases token usage but often improves:
- Accuracy
- Coherence
- Tool-calling correctness

PreviousFunction Calling NextResponse Caching

Last updated 16 days ago

hashtagOverview

hashtagSupported Models

hashtagReasoning Token Support

hashtagHow Reasoning Tokens Appear in Responses

hashtagControlling Reasoning Tokens

hashtagGeneral Structure

hashtagReasoning Effort Levels

hashtagSupported By

hashtagEffort Options

hashtagReasoning Max Tokens

hashtagSupported By

hashtagAnthropic-Specific Reasoning Behavior

hashtagRules

hashtagBudget Formula

hashtagImportant Constraint

hashtagExcluding Reasoning Tokens

hashtagToken Usage & Billing

Overview

Supported Models

Reasoning Token Support

How Reasoning Tokens Appear in Responses

Controlling Reasoning Tokens

General Structure

Reasoning Effort Levels

Supported By

Effort Options

Reasoning Max Tokens

Supported By

Anthropic-Specific Reasoning Behavior

Rules

Budget Formula

Important Constraint

Excluding Reasoning Tokens

Token Usage & Billing