Other models aren't able to solve it so there's something else happening besides...

lolinder · 2025-03-25T19:54:40 1742932480

I'm sure you're right that it's more than just it being in the training data, but that it's in the training data means that you can't draw any conclusions about general mathematical ability using just this as a benchmark, even if you substitute numbers.

There are lots of possible mechanisms by which this particular problem would become more prominent in the weights in a given round of training even if the model itself hasn't actually gotten any better at general reasoning. Here are a few:

* Random chance (these are still statistical machines after all)

* The problem resurfaced recently and shows up more often than it used to.

* The particular set of RLHF data chosen for this model draws out the weights associated with this problem in a way that wasn't true previously.

mrtesthah · 2025-03-26T00:06:04 1742947564

Google Gemini 2.5 is able to search the web, so if you're able to find the answer on reddit, maybe it can too.

mattkevan · 2025-03-25T20:32:26 1742934746

I think there’s a big push to train LLMs on maths problems - I used to get spammed on Reddit with ads for data tagging and annotation jobs.

Recently these have stopped and they’re now the ads are about becoming a maths tutor to AI.

Doesn’t seem like a role with long-term prospects.

7e · 2025-03-25T19:57:38 1742932658

Sure, but you can't cite this puzzle as proof that this model is "better than 95+% of the population at mathematical reasoning" when the method of solving (the "answer") it is online, and the model has surely seen it.

stabbles · 2025-03-25T22:28:28 1742941708

It gets it wrong when you give it 728. It claims (728, 182, 546). I won't share the answer so it won't appear in the next training set.

WithinReason · 2025-03-26T06:26:59 1742970419

with 728 the puzzle doesn't work since it's divisible by 8

eru · 2025-03-26T06:38:51 1742971131

But then the AI should tell you that, too, if it really understand the problem?

stabbles · 2025-03-26T11:28:02 1742988482

Fair, the question is what possible solutions exists.