Create speech with timing

Converts text into speech and returns both the audio and word-level timing information. Useful for applications that need to synchronize text display with audio playback.