More

alansaber · 2025-12-10T16:35:05 1765384505

It was this, or telling users their information was being transacted on a distributed system of recycled vape microcontrollers.

alansaber · 2025-12-10T16:33:16 1765384396

It's on a commercial website, but doesn't mention any specific product. So kinda?

alansaber · 2025-12-09T17:39:55 1765301995

Ah yes just what vibe coding needs, further weakening human oversight

alansaber · 2025-12-09T15:33:49 1765294429

Fine-tuning on a small corpus can definitely get you good results with some care

alansaber · 2025-12-09T15:33:02 1765294382

To caveat, smaller batch sizes are generally better for model stability, but we go bigger because it substantially speeds up training

spi · 2025-12-10T09:44:42 1765359882

Mmh not really. As OP shows, speed increases with larger batch size, but only initially, until the GPU has high enough utilization; then speed improvements flatten out (although you might get OOM before that and not "really" see the flat part). Using smaller batch size increases _noise_, so quite literally decreases stability. That might be good sometimes: in the limit case, if the batch is as large as your training set, you'll end up in local minima and not be able to get out of it. But this is true for toy datasets like MNIST, here it's an entirely different beast.

With such large corpora as the ones used here, and very noisy ones at that, gradient updates are very noisy and that can harm quality. Or anyway, common lore is that one needs pretty large batch size to have the language model improve steadily.

alansaber · 2025-12-10T16:40:32 1765384832

Are you sure about the top-cap on batch size for speed? See https://arxiv.org/pdf/1904.00962

alansaber · 2025-12-09T13:23:54 1765286634

Sorry about that. Safari couldn’t play the video due to an incompatible codec, fixed now.

hariwb · 2025-12-10T14:11:25 1765375885

I'm on a Chromium-based browser / OSX and it doesn't look like videos on the main page are loading at all for me either, unfortunately!

cjrp · 2025-12-10T14:46:28 1765377988

Ditto on Brave 1.84.141 (macOS)

n4r9 · 2025-12-10T14:59:00 1765378740

Same on firefox on Windows 11

ramon156 · 2025-12-10T15:11:30 1765379490

I also want to be in the chain!

Working on mobile firefox and in-app browser firefox

alansaber · 2025-12-10T16:36:23 1765384583

Thanks guys. I'm glad my fuck-up is keeping this thread alive.

alansaber · 2025-12-09T10:27:50 1765276070

Nothing more addictive than adding more padding

alansaber · 2025-12-09T10:26:26 1765275986

I agree with this. Consistency > immediate design beauty

alansaber · 2025-12-03T08:12:19 1764749539

Yep the further we go from highly constrained applications the riskier it'll always be

alansaber · 2025-12-01T11:29:16 1764588556

Since synthetic data for training is pretty ubiquitous seems like a novelty