Hacker Newsnew | past | comments | ask | show | jobs | submit | alansaber's commentslogin

It was this, or telling users their information was being transacted on a distributed system of recycled vape microcontrollers.

It's on a commercial website, but doesn't mention any specific product. So kinda?

Ah yes just what vibe coding needs, further weakening human oversight

Fine-tuning on a small corpus can definitely get you good results with some care

To caveat, smaller batch sizes are generally better for model stability, but we go bigger because it substantially speeds up training

Mmh not really. As OP shows, speed increases with larger batch size, but only initially, until the GPU has high enough utilization; then speed improvements flatten out (although you might get OOM before that and not "really" see the flat part). Using smaller batch size increases _noise_, so quite literally decreases stability. That might be good sometimes: in the limit case, if the batch is as large as your training set, you'll end up in local minima and not be able to get out of it. But this is true for toy datasets like MNIST, here it's an entirely different beast.

With such large corpora as the ones used here, and very noisy ones at that, gradient updates are very noisy and that can harm quality. Or anyway, common lore is that one needs pretty large batch size to have the language model improve steadily.


Are you sure about the top-cap on batch size for speed? See https://arxiv.org/pdf/1904.00962

Sorry about that. Safari couldn’t play the video due to an incompatible codec, fixed now.

I'm on a Chromium-based browser / OSX and it doesn't look like videos on the main page are loading at all for me either, unfortunately!

Ditto on Brave 1.84.141 (macOS)

Same on firefox on Windows 11

I also want to be in the chain!

Working on mobile firefox and in-app browser firefox


Thanks guys. I'm glad my fuck-up is keeping this thread alive.

Nothing more addictive than adding more padding

I agree with this. Consistency > immediate design beauty

Yep the further we go from highly constrained applications the riskier it'll always be

Since synthetic data for training is pretty ubiquitous seems like a novelty

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: