lightbulb-messageReasoning Tokens

FastRouter can return Reasoning Tokens (also known as thinking tokens) for supported models.

Overview

FastRouter can return Reasoning Tokens (also known as thinking tokens) for supported models. These tokens represent the model’s internal reasoning process and can significantly improve output quality for complex tasks such as planning, math, tool use, and multi-step analysis.

  • Reasoning tokens are enabled by default for supported models.

  • The model decides whether to generate reasoning tokens unless explicitly controlled.

  • When returned, reasoning tokens appear in the reasoning field of each message.

  • You can limit, control, or exclude reasoning tokens using the reasoning parameter.


Supported Models

Reasoning Token Support

Reasoning tokens are currently supported by:

  • Gemini thinking models

  • Anthropic models (via reasoning.max_tokens)

  • OpenAI o-series models

  • Grok models


How Reasoning Tokens Appear in Responses

When enabled, reasoning tokens appear as structured blocks in the response:

If excluded, the model still reasons internally—but the reasoning is not returned.


Controlling Reasoning Tokens

You can control reasoning behavior using the reasoning object in your request.

General Structure

⚠️ Use either effort or max_tokens — not both.


Reasoning Effort Levels

Supported By

  • OpenAI o-series

  • Grok models

Effort Options

Effort
Token Allocation

high

~80% of max_tokens

medium

~50% of max_tokens

low

~20% of max_tokens

Example:


Reasoning Max Tokens

Supported By

  • Gemini thinking models

  • Anthropic models

Example:


Anthropic-Specific Reasoning Behavior

When using Anthropic models:

Rules

  • reasoning.max_tokens

    • Used directly

    • Minimum: 1024 tokens

  • reasoning.effort

    • Converted into a reasoning token budget

  • Reasoning tokens are:

    • Minimum: 1024 tokens

    • Maximum: 32,000 tokens

Budget Formula

Where:

  • high → 0.8

  • medium → 0.5

  • low → 0.2

Important Constraint

max_tokens must be strictly greater than the reasoning budget, otherwise the model will not have enough tokens to produce a final answer.


Excluding Reasoning Tokens

You can instruct the model to reason internally without returning reasoning tokens.

  • The model still performs reasoning

  • Reasoning tokens are not included in the response

  • Works across all models


Token Usage & Billing

  • Reasoning tokens are counted as output tokens

  • They are billed the same way as regular output tokens

  • Enabling reasoning increases token usage but often improves:

    • Accuracy

    • Coherence

    • Tool-calling correctness

Last updated