Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

nvidia is absolutely cheaper per flop




To acquire, maybe, but to power?

machine capex currently dominates power

Sounds like an ecosystem ripe for horizontally scaling cheaper hardware.

If I understand correctly, a big problem is that the calculation isn't embarrasingly parallel: the various chunks are not independent, so you need to do a lot of IO to get the results from step N from your neighbours to calculate step N+1.

Using more smaller nodes means your cross-node IO is going to explode. You might save money on your compute hardware, but I wouldn't be surprised if you'd end up with an even greater cost increase on the network hardware side.


FLOPS are not what matters here.

also cheaper memory bandwidth. where are you claiming that M5 wins?

I'm not sure where else you can get a half TB of 800GB/s memory for < $10k. (Though that's the M3 Ultra, don't know about the M5). Is there something competitive in the nvidia ecosystem?

I wasn't aware that M3 Ultra offered a half terabyte of unified memory, but an RTX5090 has double that bandwidth and that's before we even get into B200 (~8TB/s).

You could get x1 M3 Ultra w/ 512gb of unified ram for the price of x2 RTX 5090 totaling 64gb of vram not including the cost of a rig capable of utilizing x2 RTX 5090.

Which would almost be great, if the M3 Ultra's GPU wasn't ~3x weaker than a single 5090: https://browser.geekbench.com/opencl-benchmarks

I don't think I can recommend the Mac Studio for AI inference until the M5 comes out. And even then, it remains to be seen how fast those GPUs are or if we even get an Ultra chip at all.


Again, memory bandwidth is pretty much all that matters here. During inference or training the CUDA cores of retail GPUs are like 15% utilized.

Not for prompt processing. Current Macs are really not great at long contexts



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: