Cartesia Sonic Text-to-Speech API

The Sonic text-to-speech API converts text into ultra-low-latency, emotive speech with sub-100ms time-to-first-byte. It supports REST, server-sent events, and WebSocket streaming for real-time voice agents and applications.

AsyncAPI Specification

cartesia-asyncapi.yml Raw ↑