> I view this recursion as more of a strength than weakness
Sure, it's a strength given that transformers are currently limited by compute budget, but theoretically, if we were to have a way to overcome this, it seems obvious to me that transformer's 'one-shot' ability makes them better.
That being said the recursive aspect you're referencing can be built into a transformer as well. This is a sampling and training problem.
Sure, it's a strength given that transformers are currently limited by compute budget, but theoretically, if we were to have a way to overcome this, it seems obvious to me that transformer's 'one-shot' ability makes them better.
That being said the recursive aspect you're referencing can be built into a transformer as well. This is a sampling and training problem.