nvidia is absolutely cheaper per flop

FlacksonFive · 2025-12-12T22:03:35 1765577015

To acquire, maybe, but to power?

whimsicalism · 2025-12-12T22:22:38 1765578158

machine capex currently dominates power

amazingman · 2025-12-12T22:44:53 1765579493

Sounds like an ecosystem ripe for horizontally scaling cheaper hardware.

crote · 2025-12-12T23:00:37 1765580437

If I understand correctly, a big problem is that the calculation isn't embarrasingly parallel: the various chunks are not independent, so you need to do a lot of IO to get the results from step N from your neighbours to calculate step N+1.

Using more smaller nodes means your cross-node IO is going to explode. You might save money on your compute hardware, but I wouldn't be surprised if you'd end up with an even greater cost increase on the network hardware side.

adastra22 · 2025-12-12T22:14:41 1765577681

FLOPS are not what matters here.

whimsicalism · 2025-12-12T22:21:50 1765578110

also cheaper memory bandwidth. where are you claiming that M5 wins?

Infernal · 2025-12-12T22:26:55 1765578415

I'm not sure where else you can get a half TB of 800GB/s memory for < $10k. (Though that's the M3 Ultra, don't know about the M5). Is there something competitive in the nvidia ecosystem?

whimsicalism · 2025-12-12T22:35:34 1765578934

I wasn't aware that M3 Ultra offered a half terabyte of unified memory, but an RTX5090 has double that bandwidth and that's before we even get into B200 (~8TB/s).

650REDHAIR · 2025-12-12T22:56:46 1765580206

You could get x1 M3 Ultra w/ 512gb of unified ram for the price of x2 RTX 5090 totaling 64gb of vram not including the cost of a rig capable of utilizing x2 RTX 5090.

bigyabai · 2025-12-13T00:55:22 1765587322

Which would almost be great, if the M3 Ultra's GPU wasn't ~3x weaker than a single 5090: https://browser.geekbench.com/opencl-benchmarks

I don't think I can recommend the Mac Studio for AI inference until the M5 comes out. And even then, it remains to be seen how fast those GPUs are or if we even get an Ultra chip at all.

adastra22 · 2025-12-13T04:41:24 1765600884

Again, memory bandwidth is pretty much all that matters here. During inference or training the CUDA cores of retail GPUs are like 15% utilized.

my123 · 2025-12-13T18:48:22 1765651702

Not for prompt processing. Current Macs are really not great at long contexts