Isn't that essentially how the MoE models already work? Besides, if that were in...

mcrutcher · 2025-09-12T13:20:38 1757683238

MoE models are pretty poorly named since all the "experts" are "the same". They're probably better described as "sparse activation" models. MoE implies some sort of "heterogenous experts" that a "thalamus router" is trained to use, but that's not how they work.

amelius · 2025-09-12T13:19:44 1757683184

> if that were infinitely scalable, wouldn't we have a subset of super-smart models already at very high cost

The compute/intelligence curve is not a straight line. It's probably more a curve that saturates, at like 70% of human intelligence. More compute still means more intelligence. But you'll never reach 100% human intelligence. It saturates way below that.

eMPee584 · 2025-09-12T16:23:34 1757694214

how would you know it converges on human limits, why wouldn't it be able to go beyond, especially if it gets its own world sim sandbox?

amelius · 2025-09-12T17:01:51 1757696511

I didn't say that. It converges well below human limits. That's what we see.

Thinking it will go beyond human limits is just wishful thinking at this point. There is no reason to believe it.

mirekrusin · 2025-09-12T12:39:28 1757680768

MoE is something different - it's a technique to activate just a small subset of parameters during inference.

Whatever is good enough now, can be much better for the same cost (time, computation, actual cost). People will always choose better over worse.

mmmllm · 2025-09-12T13:26:15 1757683575

Thanks, I wasn't aware of that. Still - why isn't there a super expensive OpenAI model that uses 1,000 experts and comes up with way better answers? Technically that would be possible to build today. I imagine it just doesn't deliver dramatically better results.

Leynos · 2025-09-12T15:28:30 1757690910

That's what GPT-5 Pro and Grok 4 Heavy do. Those are the ones you pay triple digit USD a month for.