Does LLM inference require significant bandwidth to the card? You have to get th...

		cjbprime on March 16, 2024 \| parent \| context \| favorite \| on: Ollama now supports AMD graphics cards Does LLM inference require significant bandwidth to the card? You have to get the model into VRAM, but that's a fixed startup cost, not a per-output-token cost.