> where are all those LLMs without the billions upon billions of lines of text written by humans? A not insignificant number coming from Wikipedia?
Where is Wikipedia without all the learning and information from other sources, many of which it put out of business?
> Also, LLMs don't produce truth. They don't have a concept of it. Or lies for that matter. If you are using LLMs do study something you know nothing about the information provided by them is as good as useless if you don't verify it with external sources written by a person. Wikipedia isn't perfect, nothing is, but I trust their model a shitload more then an LLM.
Wikipedia produces consensus that correlates with truth to some degree. LLMs produce statistical output, which in a way is a automated consensus of the LLM's input, that also correlates with truth to some degree - and the correlation is hardly zero.
I agree that information has no value if you don't know its accuracy; it's always a sticking point for me. IMHO Wikipedia has the same problem: I have no idea how accurate it is without verifying it with an external source (and when I've done that, I've often been disappointed).
Has anyone researched the relative accuracy of Wikipedia and LLMs?
The comment about Wikipedia supposedly putting companies out of business is so goofy I'm not even gonna comment on it. I'm surprised you'd bother trying to make a point there.
The difference is humans have a concept of truth, humans have intent. A person, taking an aggregation of their research, expertise, and experience to produce an article is (presumably) trying to produce something factual. Other humans then come along, with similar intent, and verify it. Studies in the past have shown Wikipedia's accuracy rate is roughly on par with traditional encyclopedias, and more importantly sources are clearly documented. Making validation and further research fairly easy. And if something isn't sourced I know immediately it's more suspect.
LLMs have no concept of truth, they have no "intent". They just slap words down based on statistics. It is admittedly very impressive how good they are at doing that, but they don't produce truth in any meaningful way, it more a by product. On top of that all its sources get smashed together, making it much more difficult to verify the validity of any given claim. It's also unpredictable, so the exact same prompt could produce truth one time, and a hallucination another (a situation I have run into when it comes to engineering tasks). And worst of all. Not only will an LLM be wrong, but it will be confidently and persuasively wrong.
> The comment about Wikipedia supposedly putting companies out of business is so goofy I'm not even gonna comment on it.
I've learned that when people don't have any merits to argue, they turn to ridicule. Right back at you buddy.
> The difference is humans have a concept of truth, humans have intent. A person, taking an aggregation of their research, expertise, and experience to produce an article is (presumably) trying to produce something factual.
It's pretty naive to think that humans have intent and motivation for the truth, and no others. Just look around you in the world - most communication disregards the truth either carelessly or incidentally (because they are motivated to believe or claim something else) or intentionally (lots of that).
> LLMs have no concept of truth, they have no "intent".
My calculator app has no intent or concept of truth, but outputs truth pretty reliably.
To think that I'm saying all humans intend to produce truth you'd have to intentionally misread my comment. Wikipedia obviously has to deal with bad actors and vandalism, and they have processes in place for that. My point is that the intent matters.
Calculators aren't a useful analogy for LLMs. They produce a deterministic output based on a (relatively) narrow range of inputs. The calculations to produce those outputs follow very rigid and well defined rules.
LLMs by their very nature are non-deterministic, and the inputs/outputs are far more complicated.
It's not an insult when it's true. I don't think you've made one comment that actually added something useful. I did my best to reply to what was there, but you didn't give me anything to work with. Your last comment was so unrelated there was no where left to go.
If you have something actually relavent to say you're welcome to say it.
Here's an opportunity to talk about listening, epistemology, and human intercourse:
> It's not an insult when it's true.
It's not slander, but it's certainly an insult. If you tell someone they are fat and ugly, it's an insult regardless of its truth and you shouldn't say it, ever. There's never a good reason for personal insults.
> it's true
> you're welcome to
This assumes your perspective is truth. That is the case for nobody in the world; in fact, I also have a perspective that I'm confident in, as do many others. Your statements also assume that, perhaps as the arbiter of truth, you have some authorization or power to enforce it. Again, that's nobody's business.
We're in a world of peers, generally speaking, and none of us know who is right. We need strategies to navigate that world, not the one where truth is given to you.
> you didn't give me anything to work with
When I feel like you do, it's a signal I need to listen better - the other person probably does have something to say and I'm missing it. It's possible we're talking past each other, but that's never a reason for insults.
(human intercourse)
Note that the signal is that I need to do something, not the other person. That's not because I'm 'wrong' or 'right' - those are mostly unknowable and irrelevant because 1) We're in a world of peers, generally speaking, and none of us know who is right. Also, 2) I'm the only one I can control and am responsible for, and ...
3) Respecting other people is always more important. That's a strategy for, and wisdom in, a world of uncertainty (as described), as opposed to a world of certainty. Also, it's a strategy for social creatures in social groups - it keeps groups strong and functioning. Finally, it's strategy for both loving and respecting yourself - you deserve it. You're better than insults, I'm sure; and I sometimes say the wrong thing, but I'm better than that too.
> Where is Wikipedia without all the learning and information from other sources, many of which it put out of business?
Which businesses did Wikipedia put out of business? You will frequently see a 5k word article used for a couple of sentences in a Wikipedia page, with the entire Wikipedia page itself being smaller than one paper it cites for one small corner of said page. When I’m researching events, I frequently go to Wikipedia to find sources as search engines have a drastically larger recentism bias.
> Has anyone researched the relative accuracy of Wikipedia and LLMs
No comparative research on this specific topic has been conducted afaik, and most comparative research is aging (likely, to Wikipedia’s own detriment–general consensus is that Wikipedia’s reliability has increased over time). However at the time of research publication, the consensus seems to be that Wikipedia is generally only slightly less reliable than peers in a given field (ie textbooks or británica), although Wikipedia is often less in depth. The most frequently cited study is a 2005 comparison in Nature which found 4 major errors in both Wikipedia and Británica, and 130 minor errors on Británica whereas 160 on Wikipedia. All studies are documented on Wikipedia itself, see [[Reliability of Wikipedia]]. LLMs… do not have this same reputation.
> Which businesses did Wikipedia put out of business?
Just as a start, other sources of reference, including encylopedias, dictionaries, websites, etc. For example, I'm sure it impacts McGraw-Hill's AccessScience, which likely you've never heard of.
> This is documented on Wikipedia itself
Maybe there's a little bias there? Would Wikipedia accept Wikipedia's analysis of its own reliability as a valid source?
I've heard that claim, but having no knowledge of the accuracy of any particular article, it's not worth very much to me.
> LLMs… do not have this same reputation.
They don't with you, but many people obviously use them that way. Also, reputation does not correlate strongly with reality.
> Just as a start, other sources of reference, including encylopedias, dictionaries
This just seems like healthy competition. I thought we were talking about a situation where Wikipedia’s use of other encyclopedias is an instrument of their demise.
> Maybe there's a little bias there
Paradoxically, I suspect you’d be pleasantly surprised about how tough this article is on itself. A lot of attention is given to bias in this case.
> Would Wikipedia accept Wikipedia's analysis of its own reliability as a valid source?
First, it is not Wikipedia’s own analysis. Editors should not present their own conclusions from research, just what each paper says. See [[WP:SYNTH]]. Second, generally Wikipedia discourages anyone citing it as it is not a stable source of information. Much better is to use the sources the article itself conveniently cites inline. As a general policy citing any encyclopedia is discouraged.
> having no knowledge of the accuracy of any particular article, it's not worth very much to me.
Wikipedia does have internal metrics grading the quality of an article. [[WP:ASSESS]]. In general though, even entirely discounting the Wikipedia component of the británica comparison, based on británicas own failures it seems wise to verify each and every claim in an encyclopedia, which Wikipedia does an excellent job of helping you do.
> They don't with you, but many people obviously use them that way. Also, reputation does not correlate strongly with reality
OpenAIs own benchmarks show much higher hallucination rates than any study on Wikipedia. Wikipedia itself is quite close to a ban on LLMs for reliability issues. If you ask literally any layman “has ChatGPT ever been wrong for you” they will say yes, either in that moment or after only a little prompting. It is much harder to elicit such a response regarding Wikipedia in my experience
Correct. The amount of Wikipedia pages at any given moment with active vandalism is vanishingly small. The only time I have ever stumbled upon vandalism is as part of my work as a volunteer there actively looking for such cases. Looking at my feed of possibly problematic changes at the moment, about 3 entries are appearing per minute with the most recent revert being just 2 entries ago. It is significantly worse while school is in session in my experience, but vandalism very rarely lasts long. Talking to people, people frequently confess to vandalising wikipedia at one point or another. When I ask them "how long did it survive" they tell me answers ranging between "a few moments" to "5 minutes." So to answer your question, I believe it is unlikely the average person has seen vandalism on the site barring those looking at their own shit.
>> Just as a start, other sources of reference, including encylopedias, dictionaries
> This just seems like healthy competition. I thought we were talking about a situation where Wikipedia’s use of other encyclopedias is an instrument of their demise.
Somewhere above, someone complained that LLMs were harming Wikipedia, a source of its information. My point is that Wikipedia did the same to others.
> > Which businesses did Wikipedia put out of business?
> Just as a start, other sources of reference, including encylopedias, dictionaries, websites, etc. For example, I'm sure it impacts McGraw-Hill's AccessScience, which likely you've never heard of.
Your “for example” in response to a question about what businesses Wikipedia put out of business is a business that is...still in business?
Where is Wikipedia without all the learning and information from other sources, many of which it put out of business?
> Also, LLMs don't produce truth. They don't have a concept of it. Or lies for that matter. If you are using LLMs do study something you know nothing about the information provided by them is as good as useless if you don't verify it with external sources written by a person. Wikipedia isn't perfect, nothing is, but I trust their model a shitload more then an LLM.
Wikipedia produces consensus that correlates with truth to some degree. LLMs produce statistical output, which in a way is a automated consensus of the LLM's input, that also correlates with truth to some degree - and the correlation is hardly zero.
I agree that information has no value if you don't know its accuracy; it's always a sticking point for me. IMHO Wikipedia has the same problem: I have no idea how accurate it is without verifying it with an external source (and when I've done that, I've often been disappointed).
Has anyone researched the relative accuracy of Wikipedia and LLMs?