| text |
string |
The transcribed text. |
| task |
string |
The task performed, always transcribe. |
| language |
string |
The detected or specified language of the audio. |
| duration |
number |
The duration of the input audio in seconds. |
| words |
array |
Extracted words and their corresponding timestamps. Only present when timestamp_granularities includes word. |
| segments |
array |
Segments of the transcribed text and their corresponding details. Only present when timestamp_granularities includes segment or response_format is verbose_json. |