Create dialogue with timestamps

Converts a dialogue script with multiple speakers into audio and returns word-level timing information alongside the generated audio.