Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Both the text-to-speech and the speech-to-text models launched here suffer from reliability issues due to combining instructions and data in the same stream of tokens.

I'm not yet sure how much of a problem this is for real-world applications. I wrote a few notes on this here: https://simonwillison.net/2025/Mar/20/new-openai-audio-model...



Thanks for the write up. I've been writing assembly lately, so as soon as I read your comment, I thought "hmm reminds me of section .text and section .data".




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: