But my first thought looking at this is that the numbers are probably skewed due to distribution of user skill levels, and what types of users choose which tool.
My hypothesis is that Amp is chosen by people who are VERY highly skilled in agentic development. Meaning these are the people most likely to provide solid context, good prompts, etc. That means these same people would likely get the best results from ANY coding agent. This also tracks with Amp being so expensive -- users or companies are more likely to pay a premium if they can get the most from the tool.
Claude Code on the other hand is used by (I assume) a way larger population. So the percentage of low-skill users is likely to be much higher. Those users may still get value from the tool, but their success rate will be lower by some factor with ANY coding agent. And this issue (if my hypothesis is correct) is likely 10x as true for GitHub Copilot.
Therefore I don't know how much we should read into stats like the total PR merge success percentage, because it's hard to tell the degree of noise caused by this user skill distribution imbalance.
But my first thought looking at this is that the numbers are probably skewed due to distribution of user skill levels, and what types of users choose which tool.
My hypothesis is that Amp is chosen by people who are VERY highly skilled in agentic development. Meaning these are the people most likely to provide solid context, good prompts, etc. That means these same people would likely get the best results from ANY coding agent. This also tracks with Amp being so expensive -- users or companies are more likely to pay a premium if they can get the most from the tool.
Claude Code on the other hand is used by (I assume) a way larger population. So the percentage of low-skill users is likely to be much higher. Those users may still get value from the tool, but their success rate will be lower by some factor with ANY coding agent. And this issue (if my hypothesis is correct) is likely 10x as true for GitHub Copilot.
Therefore I don't know how much we should read into stats like the total PR merge success percentage, because it's hard to tell the degree of noise caused by this user skill distribution imbalance.
Still interesting to see the numbers though!