Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Is anyone familiar with the BOINC-style grid computing scene for ML and, specifically, LLM? Is there something interesting going on, or is it infeasible? Will things like OpenLLaMA help it?


They seem to scale up, not out, so grids don't really work.

What everyone is using are HPC grade low latency interconnects to make the cluster look as close as possible to a single big TPU.


"They seem to scale up, not out, so grids don't really work."

Can someone explain what this means? I don't understand.


https://openmetal.io/docs/edu/openstack/horizontal-scaling-v...

In a typical fully connected hidden layer, the neurons each need to compute the values of the all others in the previous layer, so you need all the data in one place. Obviously you can distribute the actual calculations which is what a GPU does, but distributing that over networked CPUs will be incredibly slow and require the whole thing to be loaded into memory on all instances.

My bet is on some kind of light based or analog electric accelerator PCIE card to be the next best thing for this sort of inference, since it should be able to calculate multiple layers at once. FPGAs also work but only for fixed weights.


Further than that, with big models and training rounds that want to update potentially all the values, you can't even split the work by saying "report the fitness of this model against this cost function and report back in however much time your CPU needs" because shipping around the model and data is impractical.


I mean yeah, even just doing regular inference is borderline impossible on a normal machine given that we're even having this discussion. Training is just completely unfeasible.


Up=bigger machine

Out=lots of machines through network


The more you split it up outwards (across more nodes), the more communication among nodes that is required, which doesn’t lend itself well to regular Internet connections, which means it would prefer to scale upwards with more GPU/CPU/memory capacity per node.



I haven't looked into it or tried it yet, but there is https://petals.ml/




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: