> For the complete documentation index, see [llms.txt](https://docs.fastrouter.ai/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs.fastrouter.ai/api-reference/audio/audio-to-text.md).

# Audio to Text

## Transcribe Audio

> Transcribes audio to text in the original language using openai/whisper-1. Supports MP3, MP4, MPEG, M4A, WAV, WEBM formats (max 25MB). Output formats: json, text, srt, vtt, verbose\_json.

```json
{"openapi":"3.1.0","info":{"title":"FastRouter API Reference","version":"1.0.0"},"tags":[{"name":"Audio","description":"Transcribe, translate, and generate audio using Whisper, ElevenLabs, and other audio models."}],"servers":[{"url":"https://api.fastrouter.ai","description":"Production API"}],"security":[{"bearerAuth":[]}],"components":{"securitySchemes":{"bearerAuth":{"type":"http","scheme":"bearer","bearerFormat":"API Key","description":"FastRouter API Key. Get yours at https://fastrouter.ai\n\nFormat: `Authorization: Bearer YOUR_API_KEY`"}},"responses":{"UnauthorizedError":{"description":"Invalid Credentials - Your API key is invalid, missing, or disabled. Check your credentials.\n\nNote: the 401 error body uses `code`, `message`, `param`, and `type` (there is no `status` field), and `type` is `invalid_request_error`.","content":{"application/json":{"schema":{"type":"object","properties":{"error":{"type":"object","properties":{"message":{"type":"string"},"type":{"type":"string"},"param":{"type":"string","nullable":true},"code":{"type":"string"}}}}}}}},"RateLimitError":{"description":"Rate Limited - You have exceeded your request limits (TPM/RPM). Slow down or increase your limits.","content":{"application/json":{"schema":{"type":"object","properties":{"error":{"type":"object","properties":{"message":{"type":"string"},"type":{"type":"string"},"code":{"type":"string"},"status":{"type":"integer"}}}}}}}}}},"paths":{"/api/v1/audio/transcriptions":{"post":{"operationId":"createTranscription","tags":["Audio"],"summary":"Transcribe Audio","description":"Transcribes audio to text in the original language using openai/whisper-1. Supports MP3, MP4, MPEG, M4A, WAV, WEBM formats (max 25MB). Output formats: json, text, srt, vtt, verbose_json.","requestBody":{"required":true,"content":{"multipart/form-data":{"schema":{"type":"object","required":["file","model"],"properties":{"file":{"type":"string","format":"binary","description":"Audio file to transcribe. Supported formats: mp3, mp4, mpeg, mpga, m4a, wav, webm. Max size: 25MB"},"model":{"type":"string","enum":["whisper-1","openai/whisper-1"],"description":"Model to use for transcription. Must be 'whisper-1' or 'openai/whisper-1'"},"language":{"type":"string","description":"Optional: ISO-639-1 language code of the audio (e.g., 'en', 'es', 'fr', 'de'). Improves accuracy and latency."},"prompt":{"type":"string","description":"Optional: Text prompt to guide the transcription style or continue a previous segment. Can include punctuation, casing, or specific vocabulary."},"response_format":{"type":"string","enum":["json","text","srt","verbose_json","vtt"],"default":"json","description":"Output format:\n- json: Basic JSON with text field\n- text: Plain text only\n- srt: SubRip subtitle format\n- vtt: WebVTT subtitle format\n- verbose_json: JSON with metadata and timestamps"},"temperature":{"type":"number","minimum":0,"maximum":1,"default":0,"description":"Sampling temperature (0-1). Lower values make output more focused and deterministic. Higher values increase randomness."},"timestamp_granularities[]":{"type":"array","items":{"type":"string","enum":["word","segment"]},"default":["segment"],"description":"Timestamp granularities to include in the transcription. **Requires `response_format=verbose_json`.** One or both of `word` and `segment` can be specified.\n\n- `segment` (default) — segment-level start/end timestamps under `segments[]`. No additional latency.\n- `word` — word-level start/end timestamps under `words[]`. Adds extra latency.\n\nPass this as a repeated form field (e.g. `-F 'timestamp_granularities[]=word' -F 'timestamp_granularities[]=segment'`)."}}}}}},"responses":{"200":{"description":"Transcription successful","content":{"application/json":{"schema":{"oneOf":[{"type":"object","description":"JSON format response. In addition to `text`, FastRouter returns a `chat_id`, a `cost`, and a duration-based `usage` object.","properties":{"text":{"type":"string","description":"Transcribed text"},"chat_id":{"type":"string","description":"FastRouter internal identifier for this request."},"cost":{"type":"number","description":"Cost in USD for this request."},"usage":{"type":"object","description":"Duration-based usage for audio transcription.","properties":{"seconds":{"type":"integer","description":"Billed audio duration in seconds."},"type":{"type":"string","description":"Usage unit type."}}}}},{"type":"object","description":"Verbose JSON format response","properties":{"task":{"type":"string"},"language":{"type":"string"},"duration":{"type":"number"},"text":{"type":"string"},"segments":{"type":"array","description":"Segment-level timestamps. Included when `timestamp_granularities[]` contains `segment` (the default).","items":{"type":"object","properties":{"id":{"type":"integer"},"seek":{"type":"integer"},"start":{"type":"number"},"end":{"type":"number"},"text":{"type":"string"},"tokens":{"type":"array"},"temperature":{"type":"number"},"avg_logprob":{"type":"number"},"compression_ratio":{"type":"number"},"no_speech_prob":{"type":"number"}}}},"words":{"type":"array","description":"Word-level timestamps. Included only when `timestamp_granularities[]` contains `word`.","items":{"type":"object","properties":{"word":{"type":"string","description":"The transcribed word."},"start":{"type":"number","description":"Word start time in seconds."},"end":{"type":"number","description":"Word end time in seconds."}}}}}}]}},"text/plain":{"schema":{"type":"string","description":"Plain text transcription"}}}},"400":{"description":"Bad Request - Invalid file format or parameters"},"401":{"$ref":"#/components/responses/UnauthorizedError"},"413":{"description":"Payload Too Large - File exceeds 25MB limit"},"429":{"$ref":"#/components/responses/RateLimitError"},"500":{"description":"Internal Server Error"}}}}}}
```

## Translate Audio to English

> Translates audio to English text using openai/whisper-1, regardless of source language. Supports MP3, MP4, MPEG, M4A, WAV, WEBM formats (max 25MB). Output formats: json, text, srt, vtt, verbose\_json.

```json
{"openapi":"3.1.0","info":{"title":"FastRouter API Reference","version":"1.0.0"},"tags":[{"name":"Audio","description":"Transcribe, translate, and generate audio using Whisper, ElevenLabs, and other audio models."}],"servers":[{"url":"https://api.fastrouter.ai","description":"Production API"}],"security":[{"bearerAuth":[]}],"components":{"securitySchemes":{"bearerAuth":{"type":"http","scheme":"bearer","bearerFormat":"API Key","description":"FastRouter API Key. Get yours at https://fastrouter.ai\n\nFormat: `Authorization: Bearer YOUR_API_KEY`"}},"responses":{"UnauthorizedError":{"description":"Invalid Credentials - Your API key is invalid, missing, or disabled. Check your credentials.\n\nNote: the 401 error body uses `code`, `message`, `param`, and `type` (there is no `status` field), and `type` is `invalid_request_error`.","content":{"application/json":{"schema":{"type":"object","properties":{"error":{"type":"object","properties":{"message":{"type":"string"},"type":{"type":"string"},"param":{"type":"string","nullable":true},"code":{"type":"string"}}}}}}}},"RateLimitError":{"description":"Rate Limited - You have exceeded your request limits (TPM/RPM). Slow down or increase your limits.","content":{"application/json":{"schema":{"type":"object","properties":{"error":{"type":"object","properties":{"message":{"type":"string"},"type":{"type":"string"},"code":{"type":"string"},"status":{"type":"integer"}}}}}}}}}},"paths":{"/api/v1/audio/translations":{"post":{"operationId":"createTranslation","tags":["Audio"],"summary":"Translate Audio to English","description":"Translates audio to English text using openai/whisper-1, regardless of source language. Supports MP3, MP4, MPEG, M4A, WAV, WEBM formats (max 25MB). Output formats: json, text, srt, vtt, verbose_json.","requestBody":{"required":true,"content":{"multipart/form-data":{"schema":{"type":"object","required":["file","model"],"properties":{"file":{"type":"string","format":"binary","description":"Audio file to translate to English. Supported formats: mp3, mp4, mpeg, mpga, m4a, wav, webm. Max size: 25MB"},"model":{"type":"string","enum":["whisper-1","openai/whisper-1"],"description":"Model to use for translation. Must be 'whisper-1' or 'openai/whisper-1'"},"prompt":{"type":"string","description":"Optional: English text prompt to guide the translation style. Can help with proper nouns, acronyms, or domain-specific vocabulary."},"response_format":{"type":"string","enum":["json","text","srt","verbose_json","vtt"],"default":"json","description":"Output format:\n- json: Basic JSON with translated English text\n- text: Plain English text only\n- srt: SubRip subtitle format (English)\n- vtt: WebVTT subtitle format (English)\n- verbose_json: JSON with metadata and timestamps"},"temperature":{"type":"number","minimum":0,"maximum":1,"default":0,"description":"Sampling temperature (0-1). Lower values (e.g., 0.1) make output more focused and deterministic. Use 0 for most consistent translations."}}}}}},"responses":{"200":{"description":"Translation successful - output is in English","content":{"application/json":{"schema":{"oneOf":[{"type":"object","description":"JSON format response","properties":{"text":{"type":"string","description":"Translated English text"}}},{"type":"object","description":"Verbose JSON format response","properties":{"task":{"type":"string"},"language":{"type":"string","description":"Source language detected"},"duration":{"type":"number"},"text":{"type":"string","description":"Full translated English text"},"segments":{"type":"array","description":"Time-segmented translations","items":{"type":"object","properties":{"id":{"type":"integer"},"start":{"type":"number"},"end":{"type":"number"},"text":{"type":"string","description":"Segment translated to English"}}}}}}]}},"text/plain":{"schema":{"type":"string","description":"Plain English text translation"}}}},"400":{"description":"Bad Request - Invalid file format or parameters"},"401":{"$ref":"#/components/responses/UnauthorizedError"},"413":{"description":"Payload Too Large - File exceeds 25MB limit"},"429":{"$ref":"#/components/responses/RateLimitError"},"500":{"description":"Internal Server Error"}}}}}}
```


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://docs.fastrouter.ai/api-reference/audio/audio-to-text.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.