> …reused its embedding matrix as the weights for the linear layer that projects...

tomrod · 2025-12-09T15:34:15 1765294455

Disclaimer: working and occasionally researching in the space.

The first paragraph is clear linear algebra terminology, the second looked like deeper subfield specific jargon and I was about to ask for a citation as the words definitely are real but the claim sounded hyperspecific and unfamiliar.

I figure a person needs 12 to 18 months of linear algebra, enough to work through Horn and Johnson's "Matrix Analysis" or the more bespoke volumes from Jeffrey Humpheries to get the math behind ML. Not necessarily to use AI/ML as a tech, which really can benefit from the grind towards commodification, but to be able to parse the technical side of about 90 to 95 percent of conference papers.

danielmarkbruce · 2025-12-09T16:41:34 1765298494

One needs about 12 to 18 hours of linear algebra to work though the papers, not 12 to 18 months. The vast majority of stuff in AI/ML papers is just "we tried X and it worked!".

miki123211 · 2025-12-09T17:03:52 1765299832

You can understand 95+% of current LLM / neural network tech if you know what matrices are (on the "2d array" level, not the deeper lin alg intuition level), and if you know how to multiply them (and have an intuitive understanding why a matrix is a mapping between latent spaces and how a matrix can be treated as a list of vectors). Very basic matrix / tensor calculus comes in useful, but that's not really part of lin alg.

There are places where things like eigenvectors / eigenvalues or svd come into play, but those are pretty rare and not part of modern architectures (tbh, I still don't really have a good intuition for them).

devmor · 2025-12-09T17:45:21 1765302321

I was about to respond with a similar comment. The majority of the underlying systems are the same and can be understood if you know a decent amount of vector math. That last 3-5% can get pretty mystical, though.

Honestly, where stuff gets the most confusing to me is when the authors of the newer generations of AI papers invent new terms for existing concepts, and then new terms for combining two of those concepts, then new terms for combining two of those combined concepts and removing one... etc.

Some of this redefinition is definitely useful, but it turns into word salad very quickly and I don't often feel like teaching myself a new glossary just to understand a paper I probably wont use the concepts in.

buildbot · 2025-12-09T17:51:42 1765302702

This happens so much! It’s actually imo much more important to be able to let the math go and compare concepts vs. the exact algorithms. It’s much more useful to have semantic intuition than concrete analysis.

Being really good at math does let you figure out if two techniques are mathematically the same but that’s fairly rare (it happens though!)

whimsicalism · 2025-12-09T18:17:21 1765304241

> There are places where things like eigenvectors / eigenvalues or svd come into play, but those are pretty rare and not part of modern architectures (tbh, I still don't really have a good intuition for them)

This stuff is part of modern optimizers. You can often view a lot of optimizers as doing something similar to what is called mirror/'spectral descent.'

tomrod · 2025-12-12T15:25:47 1765553147

Indeed. "Spectral" describes the collection of eigenvalues!

tomrod · 2025-12-09T20:06:53 1765310813

Eigenvector/eigenvalues: direction and amount of stretch a matrix pushes a basis vector.

cultofmetatron · 2025-12-09T18:30:33 1765305033

for anyone looking to get into it, mathacademy has a full zero to everythign you need pathway that you can follow to mastery

https://mathacademy.com/courses/mathematics-for-machine-lear...

DenisM · 2025-12-10T21:29:39 1765402179

There is no mention of llm there?

cultofmetatron · 2025-12-11T02:39:52 1765420792

if you want to use llms, just download one and play with it. if you want to understand llms enough to push research forward, learn the underlying math

tomrod · 2025-12-12T15:26:03 1765553163

gpjt · 2025-12-09T18:13:26 1765304006

OP here -- agreed! I tried to summarise (at least to my current level of knowledge) those 12-18 hours here: https://www.gilesthomas.com/2025/09/maths-for-llms

jhardy54 · 2025-12-09T16:41:04 1765298464

> 12 to 18 months of linear algebra

Do you mean full-time study, or something else? I’ve been using inference endpoints but have recently been trying to go deeper and struggling, but I’m not sure where to start.

For example, when selecting an ASR model I was able to understand the various architectures through high-level descriptions and metaphors, but I’d like to have a deeper understanding/intuition instead of needing to outsource that to summaries and explainers from other people.

tomrod · 2025-12-09T20:02:36 1765310556

I was projecting as classes, taken across 2 to 3 semesters.

You can gloss the basics pretty quickly from things like Kahn academy and other sources.

Knowing Linalg doesn't guarantee understanding modern ML, but if you then go read seminal papers like Attention is All You Need you have a baseline to dig deeper.

woadwarrior01 · 2025-12-09T15:51:54 1765295514

It's just a long winded way of saying "tied embeddings"[1]. IIRC, GPT-2, BERT, Gemma 2, Gemma 3, some of the smaller Qwen models and many more architectures use weight tied input/output embeddings.

[1]: https://arxiv.org/abs/1608.05859

jcims · 2025-12-09T14:22:42 1765290162

The turbo encabulator lives on.

empath75 · 2025-12-09T15:16:15 1765293375

It's a 28 part series. If you start from the beginning, everything is explained in detail.

miki123211 · 2025-12-09T16:59:09 1765299549

As somebody who understands how LLMs work pretty well, I can definitely feel your pain.

I started learning about neural networks when Whisper came out, at that point I literally knew nothing about how they worked. I started by reading the Whisper paper... which made about 0 sense to me. I was wondering whether all of those fancy terms are truly necessary. Now, I can't even imagine how I'd describe similar concepts without them.

whimsicalism · 2025-12-09T17:29:14 1765301354

i consider it a bit rude to make people read AI output without flagging it immediately

squigz · 2025-12-09T18:03:00 1765303380

I'm glad I'm not the only one who has a Turbo Encabulator moment when this stuff is posted.

QuadmasterXLII · 2025-12-10T01:21:23 1765329683

The second paragraph is highly derivative of the adversarial turbo encabulator, which Schmithuber invented in the 90s. No citation of course.

BubbleRings · 2025-12-10T10:57:59 1765364279

Are you saying I should have attributed, or ChatGPT should have? I suppose I would have but my spurving bearings were rusty.

unethical_ban · 2025-12-09T17:01:23 1765299683

I was reading this thinking "Holy crap, this stuff sounds straight out of Norman Rockwell... wait, Rockwell Automation. Oh, it actually is"

ekropotin · 2025-12-09T15:31:32 1765294292

I have no idea what you’ve just said, so here is my upvote.