Wait.. there is one binary that executes just fine on half a dozen platforms? What wizardry is this?
edit: Their default LLM worked great on Windows. Fast inference on my 2080ti. You have to pass "-ngl 9999" to offload onto GPU or it runs on CPU. It is multi-modal too.