Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Deploying Llama3 70B on AWS – GPU Requirement, Cost and Step-by-Step Guide (slashml.com)
3 points by JJneid on June 13, 2024 | hide | past | favorite | 5 comments


Note that quantized versions of llama3 70B can be ran on CPU on much cheaper server. I am personally using it via llama.cpp on bare metal 6-core Xeon CPU with 128G RAM for ~50 euro monthly.


Is inference speed an issue for you?


Sufficient for fluent conversation.


usually performance takes a hit with quantization. are you getting quality responses?


Since llama3, yes, quite satisfying.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: