Who cares. It's viable so long llama.cpp works and does 15 tok/s at under 500W o...

Who cares. It's viable so long llama.cpp works and does 15 tok/s at under 500W or so. Whether the device accomplish that figure with a 8b q1 or a 1T BF16 weight files is not a fundamental boolean limiting factor, there will probably be some uses for such an instrument as proto-AGI devices.

There is a type of research called traffic surveys, which involves hiring few men with adequate education to sit or stand at an intersection for one whole day to count numbers of passing entities by types. YOLO wasn't accurate enough. I have gut feeling that vision enabled LLM would be. That doesn't require constant update or upgrades to latest NN innovations so no need to do full CUDA, so long one known good weight files work.