Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
Deploying Llama3 70B on AWS – GPU Requirement, Cost and Step-by-Step Guide
(
slashml.com
)
3 points
by
JJneid
on June 13, 2024
|
hide
|
past
|
favorite
|
5 comments
rini17
on June 13, 2024
[–]
Note that quantized versions of llama3 70B can be ran on CPU on much cheaper server. I am personally using it via llama.cpp on bare metal 6-core Xeon CPU with 128G RAM for ~50 euro monthly.
JJneid
on June 14, 2024
|
parent
|
next
[–]
Is inference speed an issue for you?
rini17
on June 14, 2024
|
root
|
parent
|
next
[–]
Sufficient for fluent conversation.
JJneid
on June 14, 2024
|
parent
|
prev
[–]
usually performance takes a hit with quantization. are you getting quality responses?
rini17
on June 14, 2024
|
root
|
parent
[–]
Since llama3, yes, quite satisfying.
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search: