Nvidia is the only one that does this on the sort of scale these AI companies need. They do boxes packed full of AI accelerators with custom high speed bus connectors and high end interconnectivity to make super computers full of GPUs very easy to purchase combined with the software to make it all work. No one else has that sort of integration to the very large that is necessary for the sort of training the top firms are doing.
For inference, Nvidia's strengths seem to be not very important.
You can do inference on a single GPU. And AFAIK the software stack is not important for inference either. Because you don't have to experiment with the sofware. You just need to get it to run and then you will run it for a long time unchanged. Groq for example runs LLAMA on their custom hardware, correct?
And I expect hardware for inference to become a bigger market than hardware for training.
The only reason you need all those GPUs is because they only have a fraction of the ram you can cram in a server.
With AMD focusing on ram channels and cores the above rig can do 6-8 tokens per second inference.
The GPUs will be faster, but the point is inference on the top deepseek model is possible for $6k with an AMD server rig. 8 H200's alone would cost $256,000 and gobble up way more power than the 400 watt envelope of that EPYC rig.
Nvidia GPUs over significantly more performance per card than Intel and AMD. In general the limiting factor is total computation, so people will pay a premium for the best. I'm not familiar with Google's hardware, and don't think it's generally available.
They all build GPUs/TPUs that can run LLMs, correct?