"Inference starts at a comfortable 30 t/s is this including the context? context...

  "Inference starts at a comfortable 30 t/s

is this including the context? context: 1000t and instruction: 20t takes (1020/30 s)? or 20/30 s?

  "Second, LLMs have goldfish-sized working memory. ... In practice, an LLM can hold several book chapters worth of comprehension “in its head” at a time. For code it’s 2k or 3k lines (code is token-dense).

That's not exactly goldfish-sized and in fact very useful already.

  "Third, LLMs are poor programmers. At best they write code at maybe an undergraduate student level who’s read a lot of documentation.

Exactly what I want for local code generation.

I think he's anti-hyping a little by pretending LLMs are in fact _not_ super-intelligent and what not. Sure, some people believe that but come on ... we're not on a McKinsey workshop here.

---

Any good German language models out there?