Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Perhaps things have changed since I last poked at this, so, standard disclaimers, take my comments with a grain of salt, etc.

GPU acceleration is not a magic "go fast" machine. It only works for certain classes of embarrassingly parallel algorithms. In a nutshell, the parallel regions need to be long enough that the speedup from doing them in the GPU's silicon outweighs the relatively high cost of getting data into and out of the GPU.

That's a fairly easy scenario to achieve with neural networks, which have a pretty high math-to-data ratio. Other machine learning algorithms, not necessarily. But basically all of them can benefit from the CPU's vector instructions, because they live in the CPU rather than out on a peripheral, so there's no hole you need to dig yourself out of before they can deliver a net benefit.

I would also say that what academics are doing is not necessarily a good barometer for what others are doing. In another nutshell, academics' professional incentives encourage them to prefer the fanciest thing that could possibly work, because their job is to push the frontiers of knowledge and technology.

Most people out in industry, though, are incentivized to do the simplest thing that could possibly work, because their job is to deliver software that is reliable and delivers a high return on investment.



Maybe the solution is a discrete SOC for ML? CPU and GPU on a card with shared memory like apples M1


I personally wouldn't bother. If you're not doing deep learning, existing hardware is already good enough that, while I can't say that nobody could get any value out of it, I'm personally not seeing the need. I'd much rather focus on the things that are actually costing me time and money, like data integrity.

Like, I would guess that the potential benefit to my team's productivity from eliminating (over)reliance on weakly typed formats such as JSON from our information systems could be orders of magnitude greater.


I can't imagine that the overlap between those using Scikit-Learn and those willing to buy and integrate ML-specialized hardware is that high. I think a lot of real-world usage of simpler ML libraries like Scikit-Learn is deploying small models onto an already existing x86 or ARM system which had cycles to spare for some basic classification or regression.


I mean, Amazon and Google are already doing that... and there's companies making ML ASICs.

Problem is... the ASICs are really good for certain classes of ML problems but aren't really all that general.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: