yes, not sure you can do better than that. You cannot still have one instance of... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		airocker on March 15, 2024 \| parent \| context \| favorite \| on: Ollama now supports AMD graphics cards yes, not sure you can do better than that. You cannot still have one instance of LLM in (GPU) memory answer two queries at one time.

eclectic29 on March 15, 2024 [–]

Of course, you can support concurrent requests. But Ollama doesn't support it and it's not meant for this purpose and that's perfectly ok. That's not the point though. For fast/perf scenarios, you're better off with vllm.

airocker on March 15, 2024 | [–]

Thanks! This is great to know.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact