> You don’t have to worry about the exact tokens that Foundation Models operates with, the API nicely abstracts that away for you [1]
I have the same question. Their Deep dive into the Foundation Models framework video is nice for seeing code using the new `FoundationModels` library but for a "deep dive", I would like to learn more about tokenization. Hopefully these details are eventually disclosed unless someone else here already knows?
I guess I'd say "mu", from a dev perspective, you shouldn't care about tokensever - if your inference framework isn't abstracting that for you, your first task would be to patch it to do so.
To parent, yes this is for local models, so insomuch worrying about token implies financial cost, yes
I maintain a llama.cpp wrapper, on everything from web to Android and cannot quite wrap my mind around if you'd have any more info by getting individual token IDs from the API, beyond what you'd get from wall clock time and checking their vocab.
Interesting point, my first reaction was "why do you need logprobs? We use constrained decoding for tool calls and don't need them"...which is actually false! Because we need to throw out those log probs then find the highest log prob of a token meeting the constraints.
Haha yeah. I’ve seen you mention the llama cpp wrapper elsewhere, it sounds cool! I’ve worked enough with vLLM and sglang to get angry at xgrammar, which I believe has some common ancestry with the GGML stack (GBNF if I’m not mistaken, which I may be). The constrained decoding part is as simple as you’d expect, just applies a bitmask to the logprobs during the “logit processing” and continuing as normal.
Does that explain why you don't have to worry about token usage? The models run locally?