Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I’m still kind of skeptical. M-series Apple hardware doesn’t even get warm during inference with some local models.

Edit: Nah I’m convinced, look at table 1. Inference costs are around 20mL in a datacenter environment.



1 kJ is for reference enough to heat 1 L (33 oz) of water by ~0.25C (~0.5F). The machine will probably heat up a few degrees if you run inference once, but since it's essentially one big heatsink it will dissipate throughout the body and into the air. The problem begins when you run it continuously, as you would in a datacenter.


Datacenters aren't running M-series chips.


Well not M-series chips specifically, but chips optimized for these kind of workloads (like the neural engine in M-series chips is).


IIRC The M series chip isn’t specifically optimized for ML workloads, the biggest gain it has is having unified video and cpu memory as transferring layers between the two is a big bottleneck on non Apple systems.

Real ML hardware (like the Nvidia H1000s) that can handle the kind of inference traffic you see in production get hot and use quite a bit of energy, especially when they run at full blast 24/7


Google’s TPU energy usage is a well-kept secret / competitive advantage. If energy efficiency isn’t a major concern for them, I bet it will be in a couple years.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: