Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

If we only look at the hardware (not the software) for inference, does Nvidia have any advantage over Intel, AMD and Google?

They all build GPUs/TPUs that can run LLMs, correct?



Nvidia is the only one that does this on the sort of scale these AI companies need. They do boxes packed full of AI accelerators with custom high speed bus connectors and high end interconnectivity to make super computers full of GPUs very easy to purchase combined with the software to make it all work. No one else has that sort of integration to the very large that is necessary for the sort of training the top firms are doing.


Yes, that's why I asked about inference.

For inference, Nvidia's strengths seem to be not very important.

You can do inference on a single GPU. And AFAIK the software stack is not important for inference either. Because you don't have to experiment with the sofware. You just need to get it to run and then you will run it for a long time unchanged. Groq for example runs LLAMA on their custom hardware, correct?

And I expect hardware for inference to become a bigger market than hardware for training.


Deepseek v3 needs 16 H100s or 8 H200 GPUs for inference.


Or a single 2 processor AMD EPYC rig, for less than $6k.

https://xcancel.com/carrigmat/status/1884244369907278106

The only reason you need all those GPUs is because they only have a fraction of the ram you can cram in a server.

With AMD focusing on ram channels and cores the above rig can do 6-8 tokens per second inference.

The GPUs will be faster, but the point is inference on the top deepseek model is possible for $6k with an AMD server rig. 8 H200's alone would cost $256,000 and gobble up way more power than the 400 watt envelope of that EPYC rig.


I thought the whole story about Deepseek was that Deepseek does not have H100s?


Nvidia GPUs over significantly more performance per card than Intel and AMD. In general the limiting factor is total computation, so people will pay a premium for the best. I'm not familiar with Google's hardware, and don't think it's generally available.


What about AMD's MI300X? I see reports all over the web that it runs LLMs at a similar speed as an H100.


It’s available on GCP through Cloud TPU, though Nvidia on GCP is still a far bigger business.


Can you run LLAMA on Google's TPUs?





Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: