Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
airocker
on March 15, 2024
|
parent
|
context
|
favorite
| on:
Ollama now supports AMD graphics cards
yes, not sure you can do better than that. You cannot still have one instance of LLM in (GPU) memory answer two queries at one time.
eclectic29
on March 15, 2024
[–]
Of course, you can support concurrent requests. But Ollama doesn't support it and it's not meant for this purpose and that's perfectly ok. That's not the point though. For fast/perf scenarios, you're better off with vllm.
airocker
on March 15, 2024
|
parent
[–]
Thanks! This is great to know.
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search: