Does anyone have any resources they recommend for just understanding the base te...

visarga · on May 3, 2023

Psst ... why don't you spend 30 minutes of quality time with chatGPT and get to the bottom of this? Get those personalised explanations and enjoy its unlimited patience.

I have felt the same in the past, related to a completely different topic. I know how it feels, it's like people are not saying things what they are, just using weird words.

"weights" - synapses in the AI brain

"tokens" - word fragments

"model" - of course, the model is the AI brain

"context" - the model can only handle a piece of text, can't put whole books in, so this limited window is the context

"GPT" - predicts the next word, trained on everything; if you feed its last predicted word back in, it can write long texts

"LoRA" - a lightweight plug-in model for tweaking the big model

"loss" - a score telling how bad is the output

"training" - change the model until it fits the data

"quantisation" - making a low precision version of the model because it still works, but now is much faster and needs less compute

"embedding" - just a vector, it stands for the meaning of a word token or a piece of image; these embeddings are learned

rodoxcasta · on May 3, 2023

But, this isn't a bad ideia when you don't know even the basics? Because you wouldn't be able to separate genuine information to subtle or not so subtle hallucinations.

It's like generating code in a language that you know nothing about. You should check for bugs, but you can't.

2devnull · on May 3, 2023

The first thing to learn is you can’t trust the internet. From that you’ll know not to trust gpt. If you are prone to trusting things blindly, without doing your own research or verification, you have far bigger problems than gpt “hallucinations” (frankly a terrible terminology).

digging · on May 3, 2023

I find "hallucinations" to be pretty apt. What works better in your opinion?

starfallg · on May 3, 2023

The neurological term for it is "Confabulation", which is a lot better than "Hallucination" as used in AI.

Confabulation is the unintended generation of false memories.

Hallucination is false perception.

Clearly, the phenomenon we are seeing with LLM researchers call Hallucination better fits Confabulation.

szundi · on May 3, 2023

Sometimes it helps when the audience gets the meaning of a word. Confabulation is not really popular among non-native english speakers, I am sure.

qup · on May 4, 2023

It's also not popular among native English speakers, I can assure you.

digging · on May 3, 2023

I don't actually think either term is more precise than the other when we're talking about LLMs, which aren't human brains. It doesn't have either memory or perception in a way that we do.

moomoo3000 · on May 3, 2023

I think the horse had left the barn on this one.

nborwankar · on May 3, 2023

“Confidently presented bullshit” is probably much more accurate. Added benefit no new vocabulary terms :-)

jstarfish · on May 3, 2023

Lies. Bullshit. Con artistry.

It's not perceiving reality incorrectly, it's presenting wholesale fiction as fact both coherently and with absolute confidence. It even forges supporting documentation ad-hoc.

GPT is not a poor schizophrenic suffering from delusions or innocuous "hallucinations." It is the world's most advanced liar.

windsignaling · on May 3, 2023

> Lies. Bullshit. Con artistry.

These are worse as they imply the thing generating the words knows the truth and purposely says something else.

An LLM is just doing next token prediction. It's a mathematical process. It's not trying to "hide" the truth from you.

wingspar · on May 3, 2023

For me, hallucination is better.

Lies, BS, and Con artistry all require conscious motive and intent. Thats a bridge to far, for me, in ascribing ‘intelligence’ to these models.

Hallucination, to me, conveys ‘seeing things (facts) that are not there’. To the extent the models are ‘perceiving’, they ARE perceiving reality incorrectly. Granted, I expect many times it’s because the source of the model training data are, at best, just wrong or are lying.

digging · on May 3, 2023

Those are very inaccurate descriptors. A lie is an intentional deception, which is impossible for GPT. It "believes" that it "knows" something about the world, which happens to have been made up wholesale by its "subconscious" (obviously I know it's not a human brain). That is pretty much a hallucination by definition, applied to a non-human "intelligence".

Besides,

> it's presenting wholesale fiction as fact both coherently and with absolute confidence

That is not in any way distinct from perceiving reality incorrectly. It is a symptom common to both skilled lying and hallucination.

newswasboring · on May 4, 2023

In my opinion people are way more afraid of hallucinations than they should be. You are not asking it to solve world hunger, this is basically like asking it to summarize Wikipedia articles. At least with GPT4 it doesn't hallucinate on basic things. I am learning typescript with it, and it hasn't given me wrong answers to direct questions yet. If you are too worried about hallucinations use something like phind.com which will give some sources.

hansvm · on May 3, 2023

Anyone can evaluate whether it's giving you a self-consistent set of statements, and the additional words it spits out are helpful for a traditional search for alternative sources.

IMO, so long as you're aware the information is often subtly wrong, it's not that different from, e.g., physics classes progressively lying to you less to allow your brain to build a framework to house the incoming ideas.

babyshake · on May 3, 2023

I think of the good things to get a sense of with ChatGPT is the types of areas where it is most and least likely to confabulate. If I asked it for an ELI5 about key concepts relating to how LLMs work, I would be highly confident it would be accurate. When you start asking about truly esoteric topics, that's when it often starts completely making things up.

vibrolax · on May 3, 2023

I like the term "confabulation". A hallucination is an artifact of an intoxicated or malfunctioning brain. In my experience, confabulation is a common occurrence in normal brains, and can occur without intention. It's why humans make such poor witnesses. It's how the brain fills in the blanks in its senses and experience.

cogitoergofutuo · on May 3, 2023

> Psst ... why don't you spend 30 minutes of quality time with chatGPT and get to the bottom of this?

I do not use ChatGPT as a search engine. Its ability to confidently hallucinate consistently places it much below a human expert on any topic that I care to understand correctly.

CamperBob2 · on May 3, 2023

That attitude is going to cost you. You'll have no choice but to abandon it at some point, as the LLM implementations get better. The improvements in GPT4 over 3.5 alone are enough to dispel a lot of my own initial skepticism.

cogitoergofutuo · on May 3, 2023

> That attitude is going to cost you.

I don’t think it will cost me much to not use the explicitly-not-a-search-engine thing as a search engine.

Which LLM will you use to verify that ChatGPT is more knowledgeable than human experts on a given topic?

CamperBob2 · on May 3, 2023

The thing is, your mistake isn't just distrusting the language model, it's trusting the search engine. No matter what tool you use, the responsibility for ensuring accuracy is ultimately yours. Similar degrees of caution and skepticism must be applied to results from both ML and traditional search engines.

They are both insanely powerful tools, and like most insanely powerful tools, the hazards are considerable.

inkysigma · on May 4, 2023

Without a search engine, how am I supposed to weigh the accuracy of an LLM? How am I supposed to take responsibility for ensuring accuracy?

I also think people who say that search engines lie are seriously overestimating the amount of lies on returned by a search result. Social media is one thing but the broader internet is filled with articles from relatively reputable sources. When I Google "what is a large language model" my top results (there aren't even ads on this particular query to really muddle things) are:

1. Wikipedia

Sure this is the most obvious place for lies but we already understand that. Moreover, the people writing the text have some notion of what is true and false unlike an LLM. I can always also use the links it provides.

2. Nvidia

Sure they have a financial motive to promote LLMs but I don't see a reason they have to outright mislead me. They also happen to publish a significant amount of ML research so probably a good source.

3. TechTarget

I don't know this source well but their description seems to agree deeply with the other two so I can be relatively sure on both this and the others' accuracy. It's a really similar story with Bing. I can also look for sources that cite specific people like a sourced Forbes article that interviews people from an LLM company.

With multiple sources, I can also build a consensus on what an LLM is and reach out further. If I really want to be sure I can type a site:edu to just double check. When I have the source and the text I can test both agreement with consensus and weigh the strength of a source. I can't do that with an LLM since it's the same model when you reprompt. I get that LLMs can give a good place to begin by giving you keywords and phrases to search but it's a really, really poor replacement for search or for learning stuff you don't have experience in.

duskwuff · on May 3, 2023

> The thing is, your mistake isn't just distrusting the language model, it's trusting the search engine.

There is a rather substantial difference between a search engine, which suggests sources which the reader can evaluate based on their merits, and a language model, whose output may or may not be based on any sources at all, and which cannot (accurately) cite sources for statements it makes.

> Similar degrees of caution and skepticism must be applied to results from both ML and traditional search engines.

This is a fairly ridiculous statement.

CamperBob2 · on May 3, 2023

This is a fairly ridiculous statement.

Really? Have you used Google lately -- say, in the past 6-12 months?

cogitoergofutuo · on May 4, 2023

I personally use search engines on a daily basis. They link me to external websites that I can trust or distrust to varying degrees depending on my prior experience with them and the amount of further research I put in.

If a person is in the habit of using a search engine like a chat bot by typing in questions AskJeeves-style and then believing what text pops up in the info cards above the ads (which are themselves above the search results), I could see how the distinction between chat bots and search engines could seem trivial.

The similarity between chat bots and search engines breaks down significantly if the user scrolls down past the info cards and ads and then clicks on a link to an external website. At that point in the user experience it is no longer like chatting with a confident NPC.

cogitoergofutuo · on May 3, 2023

> The thing is, your mistake…

This is a weird thing to write to a stranger. I suppose there will be no need to caution people about rudeness or making strange assumptions in the utopian future where humans only talk to chatbots, though.

marshray · on May 3, 2023

We're starting to be able to tell the humans from the bots because the bots can consistently demonstrate better social skills.

Of course, it will be trivial for such bots to emulate humans if they find that useful.

Fun times.

cogitoergofutuo · on May 3, 2023

It will be a wondrous day that we can finally see a computer capture the distinctly-human Urge to Post. The je ne sais quoi that makes us all donate our takes to the needy is an organic phenomenon so far.

CamperBob2 · on May 3, 2023

The je ne sais quoi that makes us all donate our takes to the needy is an organic phenomenon so far.

"I do not use ChatGPT as a search engine. Its ability to confidently hallucinate consistently places it much below a human expert on any topic that I care to understand correctly."

cogitoergofutuo · on May 3, 2023

The je ne sais quoi that makes us all donate our takes to the needy is an organic phenomenon so far.

:)

CamperBob2 · on May 4, 2023

Exactly. Just pointing out that it's not "weird" to answer an opinion disguised as an axiom with another just like it. You shared your position in no uncertain terms and I did the same. It's all good, welcome to HN.

cogitoergofutuo · on May 4, 2023

Yes. It would have been a very strange joke about posters if I somehow tried to say that I am not myself a poster, in a post. That would have been a weird thing to imply.

Thank goodness that I didn’t do that, I’d certainly have egg on my face if I hadn’t included myself in the joke and somebody called me out on it!

marshray · on May 3, 2023

well_actually.py

Salgat · on May 3, 2023

These are explanations that make sense to people who already know how deep learning works but don't really explain much to beginners beyond giving them a grossly oversimplified misrepresentation of what is being discussed (while not actually explaining anything).

My advice to folks is, if you actually want to know how this stuff works at some basic level, put in some time learning how basic linear and logistic regression work, including how to train it using back propagation. From there you'll have a solid foundation that gives enough context to understand most deep learning concepts at a high level.

visarga · on May 3, 2023

It was intended as a demystification, not a total explanation. There are millions of places explaining with technical details.

rockooooo · on May 3, 2023

> why don't you spend 30 minutes of quality time with chatGPT and get to the bottom of this?

when it can hallucinate content, why do that instead of reading a blog post from an expert?

visarga · on May 3, 2023

Oh no, it will hallucinate an obscure fact, but not basics. It's pretty good at reciting theory, it would pass many ML engineering theoretical interviews.

If you don't trust its memory, copy a piece of high quality text in the topic of interest inside the context, as reference.

rockooooo · on May 8, 2023

it's repeatedly made up entire quotes and research papers?

unethical_ban · on May 3, 2023

Not the OP, I'm still hesitant because it infuriates me I have to give them my identity which they will then log every prompt against. You think they aren't building profiles on people? AI moties(more in gods eye reference )is what they are.

tikkun · on May 3, 2023

I think this is the right answer, ChatGPT is an excellent 1-1 tutor.

zoogeny · on May 3, 2023

Andrej Karpathy's Zero to Hero video series [1] is a good middle ground. It isn't super low-level but it also isn't super high-level. I think seeing how the pieces actually fit together in a working project is valuable to get a real understanding.

After going through this series I can say I basically understand weights, tokens, back-propagation, layers, embeddings, etc.

1. https://karpathy.ai/zero-to-hero.html

CamperBob2 · on May 3, 2023

I'm working my way through that series now. He really is a good teacher -- I keep waiting for the inevitable "Next, draw the rest of the fucking owl" moment, but so far he does seem to be sticking to his commitment to a from-scratch approach.

data_maan · on May 3, 2023

When was this published? Is this an older tutorial by Karpathy?

Just curious, didn't see any date...

knutzui · on May 3, 2023

The first class is 8 months old and the latest one is 3 months old. If you click on the links, they'll direct you to YouTube videos.

rini17 · on May 3, 2023

On youtube you can. First video 8 months ago.

mabbo · on May 3, 2023

Weights are basically number/float variables. In neural networks, vectors of values are multiplied (or math'd in some way) by weights to get new vectors of values. A 500 billion weight model has 500 billion variables, all carefully chosen via training.

A model is some architecture of how data will flow through these weight matrices, along with the values of each weight.

Tokens are sort of "words" in a sentence, but the ML may be translating the word itself into a more abstract concept in 'word space': eg, a bunch of floating point values.

At least some of what I just said is probably wrong, but now someone will correct me and we'll both me more right!

mrtranscendence · on May 3, 2023

At a first approximation this is pretty good. I wouldn't say this exactly:

> A model is some architecture of how data will flow through these weight matrices, along with the values of each weight.

Because data doesn't really flow through weight matrices, though perhaps this is true if you squint at very simple models. Deep learning architectures are generally more complicated than multiplying values by weights and pushing the results to the next layer, though which architecture to use depends heavily on context.

> Tokens are sort of "words" in a sentence

Tokens are funny. What a token is depends on the context of the model you're using, but generally a token is a portion of a word. (Why? Efficiency is one reason; handling unknown words is another.)

jstarfish · on May 3, 2023

> What a token is depends on the context of the model you're using, but generally a token is a portion of a word.

When doing quick estimates, I just assume every syllable is a token. It tends to overestimate, which is fine for my OOM mitigation purposes.

heliophobicdude · on May 3, 2023

Probably not the answer you would like but I think your approach to download them and figure out how to run them on your machine is a good one. You don't need to understand everything to get something working. It can be overwhelming and unproductive to know everything before getting started.

To learn more deeply though, get started with getting it to work and when you are curious or something doesn't work, try to understand why and recursively go back to fill in the foundational details.

Example, download the code try to get it to work. Why is it not working? Oh it's trying to look for the model. Search for how to get the model and set it up. Then key step, recursively look up every single thing in the guide or set up. Don't try to set something up or fix some thing without truly understanding what it is you are doing (e.g. copy and paste). This gives you a structured why to fill in the foundations of what it is you are trying to get to work in a more focused and productive manner. At the end you might realize that their approach or yours is not optimal "oh it was telling me to download the 65k model when I can only run 7k on my machine bc ..."

2devnull · on May 3, 2023

For a good general non-technical introduction I recommend the YouTube computerphile series related to language models, transformers and other general concepts. If you are interested in actually doing stuff there’s an over abundance of material out there already, if you try looking.

bobbyi · on May 3, 2023

I haven't watched it yet, but the Practical Deep Learning for Coders course that's available on YouTube is often recommended

https://course.fast.ai/

mhh__ · on May 3, 2023

A book about AI. (Norvig and Russell comes to mind)