Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Take a look at ik_llama.cpp: https://github.com/ikawrakow/ik_llama.cpp

CPU performance is much better than mainline llama, as well as having more quantization types available



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: