Response Caching
Response Caching allows FastRouter.ai users to cache LLM responses for repeated or similar prompts.
Response Caching delivers delivers faster response times, lower costs, and consistent outputs across applications.
Caching is especially effective for:
APIs with predictable or repetitive queries
FastRouter supports exact-match and semantic-match caching with flexible controls.
Cache hits return in <10ms
Cache hits billed at 0.1× token pricing (90% savings)
Identical or similar inputs return consistent responses
Fewer upstream API calls, improved rate-limit headroom
Multiple caching strategies for multi-turn chats
User-defined namespaces for precise cache control
Feature Specification
Caching is enabled by including a cache_key header and an optional cache configuration object in the request body.
Header
Type
Required
Description
Bearer token with API key
User-defined cache namespace. If omitted, caching is disabled
Cache Key Header
The cache_key header defines the primary cache namespace.
Purpose
Groups related requests under a shared cache scope
Examples
FastRouter combines cache_key with hashed request attributes to form the final lookup key.
Cache Object Parameters
Parameter
Type
Default
Required
Description
Cache TTL in seconds (60–86400)
Match cache on model name
How conversation context is matched
Used only when conversation_mode = last_n_turns
Minimum semantic similarity score (0–1) required to reuse a cached response
🔍 similarity_threshold Explained
Enables semantic caching in addition to exact matches
A value of:
0.75 (default) → allows minor rewording or paraphrases
<0.7 → more aggressive reuse (use with caution)
If no cached entry meets the threshold, the request is treated as a cache miss.
Conversation Modes
Entire message history included
Last N user–assistant pairs
Turn Definition:
One turn = one user message + one assistant response.
Cache Lookup Components
The final cache lookup is computed based on:
similarity_threshold is applied after lookup to determine semantic eligibility.
Prompt Messages For Lookup
Conversation Mode
Messages Included
Last N turns + system message
Parameter Sensitivity
Always included in response caching:
Different values → different cache keys
Different values → different cache keys
Different values → different cache keys
Ignored for cache hashing:
Returned normally and stored in cache.
Returned instantly with cache metadata.
Cache Response Fields
True when served from cache
Semantic similarity score (1 = exact match)
Cache hits billed at 0.1×
0.1× standard token price
Savings: ~90%
Streaming Support
Cached Streaming Responses
On cache hit + stream: true:
Cached response is chunked and streamed
Minimal artificial delay (default: 0ms)
Streaming Behavior
Streamed from provider and cached