Automatic speech recognition inference

Transcribe audio to text using a speech recognition model.