You can't really compare those 2. Agents a re non-deterministic. I can tell Clod to go update my unit test coverage and it will choke itself, burn 200k tokens and then loudly proclaim "Great! I've updated unit test coverage".
I'll kill that terminal, open it again and run the exact same command. 30k tokens, actually adds new tests.
It's hard to "learn" when the feedback cycle can take 30 minutes and result in the agent sitting in the corner touching itself and crooning about what a good boy it is. It's hard to _want_ to learn when you can't trust the damn thing with the same prompt twice.
I'll kill that terminal, open it again and run the exact same command. 30k tokens, actually adds new tests.
It's hard to "learn" when the feedback cycle can take 30 minutes and result in the agent sitting in the corner touching itself and crooning about what a good boy it is. It's hard to _want_ to learn when you can't trust the damn thing with the same prompt twice.