Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Does LLM inference require significant bandwidth to the card? You have to get the model into VRAM, but that's a fixed startup cost, not a per-output-token cost.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: