Does anyone have any resources they recommend for just understanding the base terminology of models like this? I always see the terms "weights", "tokens", "model", etc. I feel like I understand what these mean, but I have no idea what I need to care about them for in open models like this? If I were to download an open model to run on my machine, would I download the weights? I'm just ignorant in the ML space I guess but not sure where to start.
Psst ... why don't you spend 30 minutes of quality time with chatGPT and get to the bottom of this? Get those personalised explanations and enjoy its unlimited patience.
I have felt the same in the past, related to a completely different topic. I know how it feels, it's like people are not saying things what they are, just using weird words.
"weights" - synapses in the AI brain
"tokens" - word fragments
"model" - of course, the model is the AI brain
"context" - the model can only handle a piece of text, can't put whole books in, so this limited window is the context
"GPT" - predicts the next word, trained on everything; if you feed its last predicted word back in, it can write long texts
"LoRA" - a lightweight plug-in model for tweaking the big model
"loss" - a score telling how bad is the output
"training" - change the model until it fits the data
"quantisation" - making a low precision version of the model because it still works, but now is much faster and needs less compute
"embedding" - just a vector, it stands for the meaning of a word token or a piece of image; these embeddings are learned
But, this isn't a bad ideia when you don't know even the basics? Because you wouldn't be able to separate genuine information to subtle or not so subtle hallucinations.
It's like generating code in a language that you know nothing about. You should check for bugs, but you can't.
The first thing to learn is you can’t trust the internet. From that you’ll know not to trust gpt. If you are prone to trusting things blindly, without doing your own research or verification, you have far bigger problems than gpt “hallucinations” (frankly a terrible terminology).
I don't actually think either term is more precise than the other when we're talking about LLMs, which aren't human brains. It doesn't have either memory or perception in a way that we do.
It's not perceiving reality incorrectly, it's presenting wholesale fiction as fact both coherently and with absolute confidence. It even forges supporting documentation ad-hoc.
GPT is not a poor schizophrenic suffering from delusions or innocuous "hallucinations." It is the world's most advanced liar.
Lies, BS, and Con artistry all require conscious motive and intent. Thats a bridge to far, for me, in ascribing ‘intelligence’ to these models.
Hallucination, to me, conveys ‘seeing things (facts) that are not there’. To the extent the models are ‘perceiving’, they ARE perceiving reality incorrectly. Granted, I expect many times it’s because the source of the model training data are, at best, just wrong or are lying.
Those are very inaccurate descriptors. A lie is an intentional deception, which is impossible for GPT. It "believes" that it "knows" something about the world, which happens to have been made up wholesale by its "subconscious" (obviously I know it's not a human brain). That is pretty much a hallucination by definition, applied to a non-human "intelligence".
Besides,
> it's presenting wholesale fiction as fact both coherently and with absolute confidence
That is not in any way distinct from perceiving reality incorrectly. It is a symptom common to both skilled lying and hallucination.
In my opinion people are way more afraid of hallucinations than they should be. You are not asking it to solve world hunger, this is basically like asking it to summarize Wikipedia articles. At least with GPT4 it doesn't hallucinate on basic things. I am learning typescript with it, and it hasn't given me wrong answers to direct questions yet. If you are too worried about hallucinations use something like phind.com which will give some sources.
Anyone can evaluate whether it's giving you a self-consistent set of statements, and the additional words it spits out are helpful for a traditional search for alternative sources.
IMO, so long as you're aware the information is often subtly wrong, it's not that different from, e.g., physics classes progressively lying to you less to allow your brain to build a framework to house the incoming ideas.
I think of the good things to get a sense of with ChatGPT is the types of areas where it is most and least likely to confabulate. If I asked it for an ELI5 about key concepts relating to how LLMs work, I would be highly confident it would be accurate. When you start asking about truly esoteric topics, that's when it often starts completely making things up.
I like the term "confabulation". A hallucination is an artifact of an intoxicated or malfunctioning brain. In my experience, confabulation is a common occurrence in normal brains, and can occur without intention. It's why humans make such poor witnesses. It's how the brain fills in the blanks in its senses and experience.
> Psst ... why don't you spend 30 minutes of quality time with chatGPT and get to the bottom of this?
I do not use ChatGPT as a search engine. Its ability to confidently hallucinate consistently places it much below a human expert on any topic that I care to understand correctly.
That attitude is going to cost you. You'll have no choice but to abandon it at some point, as the LLM implementations get better. The improvements in GPT4 over 3.5 alone are enough to dispel a lot of my own initial skepticism.
The thing is, your mistake isn't just distrusting the language model, it's trusting the search engine. No matter what tool you use, the responsibility for ensuring accuracy is ultimately yours. Similar degrees of caution and skepticism must be applied to results from both ML and traditional search engines.
They are both insanely powerful tools, and like most insanely powerful tools, the hazards are considerable.
Without a search engine, how am I supposed to weigh the accuracy of an LLM? How am I supposed to take responsibility for ensuring accuracy?
I also think people who say that search engines lie are seriously overestimating the amount of lies on returned by a search result. Social media is one thing but the broader internet is filled with articles from relatively reputable sources. When I Google "what is a large language model" my top results (there aren't even ads on this particular query to really muddle things) are:
1. Wikipedia
Sure this is the most obvious place for lies but we already understand that. Moreover, the people writing the text have some notion of what is true and false unlike an LLM. I can always also use the links it provides.
2. Nvidia
Sure they have a financial motive to promote LLMs but I don't see a reason they have to outright mislead me. They also happen to publish a significant amount of ML research so probably a good source.
3. TechTarget
I don't know this source well but their description seems to agree deeply with the other two so I can be relatively sure on both this and the others' accuracy. It's a really similar story with Bing. I can also look for sources that cite specific people like a sourced Forbes article that interviews people from an LLM company.
With multiple sources, I can also build a consensus on what an LLM is and reach out further. If I really want to be sure I can type a site:edu to just double check. When I have the source and the text I can test both agreement with consensus and weigh the strength of a source. I can't do that with an LLM since it's the same model when you reprompt. I get that LLMs can give a good place to begin by giving you keywords and phrases to search but it's a really, really poor replacement for search or for learning stuff you don't have experience in.
> The thing is, your mistake isn't just distrusting the language model, it's trusting the search engine.
There is a rather substantial difference between a search engine, which suggests sources which the reader can evaluate based on their merits, and a language model, whose output may or may not be based on any sources at all, and which cannot (accurately) cite sources for statements it makes.
> Similar degrees of caution and skepticism must be applied to results from both ML and traditional search engines.
I personally use search engines on a daily basis. They link me to external websites that I can trust or distrust to varying degrees depending on my prior experience with them and the amount of further research I put in.
If a person is in the habit of using a search engine like a chat bot by typing in questions AskJeeves-style and then believing what text pops up in the info cards above the ads (which are themselves above the search results), I could see how the distinction between chat bots and search engines could seem trivial.
The similarity between chat bots and search engines breaks down significantly if the user scrolls down past the info cards and ads and then clicks on a link to an external website. At that point in the user experience it is no longer like chatting with a confident NPC.
This is a weird thing to write to a stranger. I suppose there will be no need to caution people about rudeness or making strange assumptions in the utopian future where humans only talk to chatbots, though.
It will be a wondrous day that we can finally see a computer capture the distinctly-human Urge to Post. The je ne sais quoi that makes us all donate our takes to the needy is an organic phenomenon so far.
The je ne sais quoi that makes us all donate our takes to the needy is an organic phenomenon so far.
"I do not use ChatGPT as a search engine. Its ability to confidently hallucinate consistently places it much below a human expert on any topic that I care to understand correctly."
Exactly. Just pointing out that it's not "weird" to answer an opinion disguised as an axiom with another just like it. You shared your position in no uncertain terms and I did the same. It's all good, welcome to HN.
Yes. It would have been a very strange joke about posters if I somehow tried to say that I am not myself a poster, in a post. That would have been a weird thing to imply.
Thank goodness that I didn’t do that, I’d certainly have egg on my face if I hadn’t included myself in the joke and somebody called me out on it!
These are explanations that make sense to people who already know how deep learning works but don't really explain much to beginners beyond giving them a grossly oversimplified misrepresentation of what is being discussed (while not actually explaining anything).
My advice to folks is, if you actually want to know how this stuff works at some basic level, put in some time learning how basic linear and logistic regression work, including how to train it using back propagation. From there you'll have a solid foundation that gives enough context to understand most deep learning concepts at a high level.
Oh no, it will hallucinate an obscure fact, but not basics. It's pretty good at reciting theory, it would pass many ML engineering theoretical interviews.
If you don't trust its memory, copy a piece of high quality text in the topic of interest inside the context, as reference.
Not the OP, I'm still hesitant because it infuriates me I have to give them my identity which they will then log every prompt against. You think they aren't building profiles on people? AI moties(more in gods eye reference )is what they are.
Andrej Karpathy's Zero to Hero video series [1] is a good middle ground. It isn't super low-level but it also isn't super high-level. I think seeing how the pieces actually fit together in a working project is valuable to get a real understanding.
After going through this series I can say I basically understand weights, tokens, back-propagation, layers, embeddings, etc.
I'm working my way through that series now. He really is a good teacher -- I keep waiting for the inevitable "Next, draw the rest of the fucking owl" moment, but so far he does seem to be sticking to his commitment to a from-scratch approach.
Weights are basically number/float variables. In neural networks, vectors of values are multiplied (or math'd in some way) by weights to get new vectors of values. A 500 billion weight model has 500 billion variables, all carefully chosen via training.
A model is some architecture of how data will flow through these weight matrices, along with the values of each weight.
Tokens are sort of "words" in a sentence, but the ML may be translating the word itself into a more abstract concept in 'word space': eg, a bunch of floating point values.
At least some of what I just said is probably wrong, but now someone will correct me and we'll both me more right!
At a first approximation this is pretty good. I wouldn't say this exactly:
> A model is some architecture of how data will flow through these weight matrices, along with the values of each weight.
Because data doesn't really flow through weight matrices, though perhaps this is true if you squint at very simple models. Deep learning architectures are generally more complicated than multiplying values by weights and pushing the results to the next layer, though which architecture to use depends heavily on context.
> Tokens are sort of "words" in a sentence
Tokens are funny. What a token is depends on the context of the model you're using, but generally a token is a portion of a word. (Why? Efficiency is one reason; handling unknown words is another.)
Probably not the answer you would like but I think your approach to download them and figure out how to run them on your machine is a good one. You don't need to understand everything to get something working. It can be overwhelming and unproductive to know everything before getting started.
To learn more deeply though, get started with getting it to work and when you are curious or something doesn't work, try to understand why and recursively go back to fill in the foundational details.
Example, download the code try to get it to work. Why is it not working? Oh it's trying to look for the model. Search for how to get the model and set it up. Then key step, recursively look up every single thing in the guide or set up. Don't try to set something up or fix some thing without truly understanding what it is you are doing (e.g. copy and paste). This gives you a structured why to fill in the foundations of what it is you are trying to get to work in a more focused and productive manner. At the end you might realize that their approach or yours is not optimal "oh it was telling me to download the 65k model when I can only run 7k on my machine bc ..."
For a good general non-technical introduction I recommend the YouTube computerphile series related to language models, transformers and other general concepts. If you are interested in actually doing stuff there’s an over abundance of material out there already, if you try looking.