Is anyone familiar with the BOINC-style grid computing scene for ML and, specifically, LLM? Is there something interesting going on, or is it infeasible? Will things like OpenLLaMA help it?
In a typical fully connected hidden layer, the neurons each need to compute the values of the all others in the previous layer, so you need all the data in one place. Obviously you can distribute the actual calculations which is what a GPU does, but distributing that over networked CPUs will be incredibly slow and require the whole thing to be loaded into memory on all instances.
My bet is on some kind of light based or analog electric accelerator PCIE card to be the next best thing for this sort of inference, since it should be able to calculate multiple layers at once. FPGAs also work but only for fixed weights.
Further than that, with big models and training rounds that want to update potentially all the values, you can't even split the work by saying "report the fitness of this model against this cost function and report back in however much time your CPU needs" because shipping around the model and data is impractical.
I mean yeah, even just doing regular inference is borderline impossible on a normal machine given that we're even having this discussion. Training is just completely unfeasible.
The more you split it up outwards (across more nodes), the more communication among nodes that is required, which doesn’t lend itself well to regular Internet connections, which means it would prefer to scale upwards with more GPU/CPU/memory capacity per node.