Fish Audio API

The Fish Audio API provides RESTful access to text-to-speech, speech-to-text, voice cloning, and voice management capabilities backed by the Fish Audio S2-Pro model. Endpoints support streaming low-latency generation, multilingual synthesis across 30+ languages, emotion control, and on-the-fly custom voice creation from short reference clips. The API is consumed through the Fish Audio Python, Go, and TypeScript SDKs and a community of integrations including n8n.

API entry from apis.yml

apis.yml Raw ↑
aid: fish-audio:fish-audio-api
name: Fish Audio API
description: The Fish Audio API provides RESTful access to text-to-speech, speech-to-text, voice cloning,
  and voice management capabilities backed by the Fish Audio S2-Pro model. Endpoints support streaming
  low-latency generation, multilingual synthesis across 30+ languages, emotion control, and on-the-fly
  custom voice creation from short reference clips. The API is consumed through the Fish Audio Python,
  Go, and TypeScript SDKs and a community of integrations including n8n.
humanURL: https://docs.fish.audio
baseURL: https://api.fish.audio
tags:
- Text to Speech
- Voice Cloning
- Speech to Text
- Streaming
- REST
- Audio
properties:
- type: Documentation
  url: https://docs.fish.audio
- type: GettingStarted
  url: https://docs.fish.audio/quickstart
- type: Playground
  url: https://fish.audio/discovery
- type: SDK
  url: https://github.com/fishaudio/fish-audio-python
- type: SDK
  url: https://github.com/fishaudio/fish-audio-go
- type: GitHubOrganization
  url: https://github.com/fishaudio
features:
- name: Text-to-Speech Generation
  description: Synthesize natural, emotionally expressive speech from text using the Fish Audio S2-Pro
    model across 30+ languages.
- name: Voice Cloning
  description: Create custom voice models from as little as 15 seconds of reference audio for downstream
    TTS.
- name: Speech-to-Text Transcription
  description: Transcribe audio with multispeaker detection and emotion tagging metadata.
- name: Streaming Audio
  description: Low-latency streaming responses suitable for real-time agent, IVR, and live narration use
    cases.
- name: Emotion and Prosody Control
  description: Inline emotion tags (angry, sad, excited) and special effects (laughing, sobbing) for expressive
    output.
- name: Multilingual Synthesis
  description: Native support for English, Mandarin, Japanese, Korean, and more than 25 additional languages.
- name: Voice Library
  description: Access to a hosted library of more than two million pre-built voices for instant TTS generation.
useCases:
- name: Audiobook and Podcast Production
  description: Generate full-length narrated content with multi-character voices via Story Studio workflows.
- name: Conversational Agents and IVR
  description: Power voice-first agents and interactive voice response systems with low-latency synthesis.
- name: Gaming NPC Dialogue
  description: Create dynamic in-game character voices and barks without manual voice-over sessions.
- name: Video and Content Localization
  description: Dub and localize video, social, and marketing content across dozens of languages.
- name: Accessibility Tooling
  description: Embed expressive screen reading and assistive voice output in accessibility products.
integrations:
- name: Python SDK
- name: Go SDK
- name: TypeScript SDK
- name: n8n
- name: LangChain
- name: Hugging Face
- name: Discord
authentication:
- type: API Key
  description: Requests authenticate using a Bearer API key issued from the Fish Audio dashboard.