Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

You could instruct the LLM to classify messages with high level tags like for coffee, drinks, etc. always include beverage.

Given how fast interference has become and given current supported context window sizes for most SOTA models, I think summarizing and having the LLM decide what is relevant is not that fragile at all for most use cases. This is what I do with my analyzers which I talk about at https://github.com/gitsense/chat/blob/main/packages/chat/wid...



Inference is not fast by any metric. It is many, MANY orders of magnitude slower than alternatives.


Honestly Gemini Flash Lite and models on Cerebras are extremely fast. I know what you are saying. If the goal is to get a lot of results where they may or may not be relevant, then yes, it is an order of a magnitude slower.

If you take into consideration the post analysis process, which is what inference is trying to solve, is it an order of a magnitude slower?


More like 6-8 orders of magnitude slower. That’s a very nontrivial difference in performance!


How are you quantify the speed at which results are reviewed?


It’s not speed, but cost to compute.


It has become fast enough that another call isn't going to overwhelm your pipeline. If you needed this kind of functionality for performance computing perhaps it wouldn't be feasible, but it is being used to feed back into an LLM. The user will never notice.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: