Everyone and their dog says "transformer LLMs are flawed", but words are cheap - and in practice, no one seems to have come up with something that's radically better.
Sidegrades yes, domain specific improvements yes, better performance across the board? Haha no. For how simple autoregressive transformers seem, they sure set a high bar.
Everyone and their dog says "transformer LLMs are flawed", but words are cheap - and in practice, no one seems to have come up with something that's radically better.
Sidegrades yes, domain specific improvements yes, better performance across the board? Haha no. For how simple autoregressive transformers seem, they sure set a high bar.