Take a look at ik_llama.cpp: https://github.com/ikawrakow/ik_llama.cpp CPU perfo...

		akawry 4 months ago \| parent \| context \| favorite \| on: I want everything local – Building my offline AI w... Take a look at ik_llama.cpp: https://github.com/ikawrakow/ik_llama.cpp CPU performance is much better than mainline llama, as well as having more quantization types available