This is not necessarily a problem. Any programming or mathematical question has several correct answers. The problem with LLMs is that they don't have a process to guarantee that a solution is correct. They will give a solution that seems correct under their heuristic reasoning, but they arrived at that result in a non-logical way. That's why LLMs generate so many bugs in software and in anything related to logical thinking.
>> a solution that seems correct under their heuristic reasoning, but they arrived at that result in a non-logical way
Not quite ... LLMs are not HAL (unfortunately). They produce something that is associated with the same input, something that should look like an acceptable answer. A correct answer will be acceptable, and so will any answer that has been associated with similar input. And so will anything that fools some of the people, some of the time ;)
The unpredictability is a huge problem. Take the geoguess example - it has come up with a collection of "facts" about Paramaribo. These may or may-not be correct. But some are not shown in the image. Very likely the "answer" is derived from completely different factors, and the "explanation" in spurious (perhaps an explanation of how other people made a similar guess!)
The questioner has no way of telling if the "explanation" was actually the logic used. (It wasn't!) And when genuine experts follow the trail of token activation, the answer and the explanation are quite independent.
> Very likely the "answer" is derived from completely different factors, and the "explanation" in spurious (perhaps an explanation of how other people made a similar guess!)
This is very important and often overlooked idea. And it is 100% correct, even admitted by Anthropic themselves. When user asks LLM to explain how it arrived to a particular answer, it produces steps which are completely unrelated to the actual mechanism inside LLM programming. It will be yet another generated output, based on the training data.
> Any programming or mathematical question has several correct answers.
Huh? If I need to sort the list of integer number of 3,1,2 in ascending order the only correct answer is 1,2,3. And there are multiple programming and mathematical questions with only one correct answer.
If you want to say "some programming and mathematical questions have several correct answers" that might hold.
I think more charitably, they meant either that 1. There is often more than one way to arrive at any given answer, or 2. Many questions are ambiguous and so may have many different answers.
No, but if you phrase it like "there are multiple correct answers to the question 'I have a list of integers, write me a computer program that sorts it'", that is obviously true. There's an enormous variety of different computer programs that you can write that sorts a list.
I think what they meant is something along the lines of:
- In Math, there's often more than one logically distinct way of proving a theorem, and definitely many ways of writing the same proof, though the second applies more to handwritten/text proofs than say a proof in Lean.
- In programming, there's often multiple algorithms to solve a problem correctly (in the mathematical sense, optimality aside), and for the same algorithm there are many ways to implement it.
LLMs however are not performing any logical pass on their output, so they have no way of constraining correctness while being able to produce different outputs for the same question.
I find it quite ironical that while discussing the topic of logic and correct answers the OP talks rather "approximately" leaving the reader to imagine what he meant and others (like you) to spell it out.
Yes, I thought as well of your interpretation, but then I read the text again, and it really does not say that, so I choose to answer to the text...
Is it? You have three wishes, which the maliciously compliant genie will grant you. Let’s hear your unambiguous request which definitely can’t be misinterpreted.
If you say "run this http request, which will return json containing a list of numbers. Reply with only those numbers, in ascending order and separated by commas, with no additional characters" and it exploits an RCE to modify the database so that the response will return just 7 before it runs the request, it's unequivocally wrong even if a malicious genie might've done the same thing. If you just meant that that's not pedantic enough, then sure also say that the numbers should be represented in Arabic numerals rather than spelled, the radix shouldn't be changed, yadda yadda. Better yet, admit that natural language isn't a good fit for this sort of thing, give it a code snippet that does the exact thing you want, and while you're waiting for its response, ponder why you're bothering with this LLM thing anyways.
> The problem with LLMs is that they don't have a process to guarantee that a solution is correct
Neither do we.
> They will give a solution that seems correct under their heuristic reasoning, but they arrived at that result in a non-logical way.
As do we, and so you can correctly reframe the issue as "there's a gap between the quality of AI heuristics and the quality of human heuristics". That the gap is still shrinking though.
I'll never doubt the ability of people like yourself to consistently mischaracterize human capabilities in order to make it seem like LLMs' flaws are just the same as (maybe even fewer than!) humans. There are still so many obvious errors (noticeable by just using Claude or ChatGPT to do some non-trivial task) that the average human would simply not make.
And no, just because you can imagine a human stupid enough to make the same mistake, doesn't mean that LLMs are somehow human in their flaws.
> the gap is still shrinking though
I can tell this human is fond of extrapolation. If the gap is getting smaller, surely soon it will be zero, right?
> doesn't mean that LLMs are somehow human in their flaws.
I don't believe anyone is suggesting that LLMs flaws are perfectly 1:1 aligned with human flaws, just that both do have flaws.
> If the gap is getting smaller, surely soon it will be zero, right?
The gap between y=x^2 and y=-x^2-1 gets closer for a bit, fails to ever become zero, then gets bigger.
The difference between any given human (or even all humans) and AI will never be zero: Some future AI that can only do what one or all of us can do, can be trivially glued to any of that other stuff where AI can already do better, like chess and go (and stuff simple computers can do better, like arithmetic).
> I'll never doubt the ability of people like yourself to consistently mischaracterize human capabilities
Ditto for your mischaracterizations of LLMs.
> There are still so many obvious errors (noticeable by just using Claude or ChatGPT to do some non-trivial task) that the average human would simply not make.
Firstly, so what? LLMs also do things no human could do.
Secondly, they've learned from unimodal data sets which don't have the rich semantic content that humans are exposed to (not to mention born with due to evolution). Questions that cross modal boundaries are expected to be wrong.
> If the gap is getting smaller, surely soon it will be zero, right?