> And it's local and on device. Does that explain why you don't have to worry ab...

hbcondo714 · 2025-06-09T22:37:32 1749508652

> You don’t have to worry about the exact tokens that Foundation Models operates with, the API nicely abstracts that away for you [1]

I have the same question. Their Deep dive into the Foundation Models framework video is nice for seeing code using the new `FoundationModels` library but for a "deep dive", I would like to learn more about tokenization. Hopefully these details are eventually disclosed unless someone else here already knows?

[1] https://developer.apple.com/videos/play/wwdc2025/301/?time=1...

refulgentis · 2025-06-10T03:16:17 1749525377

I guess I'd say "mu", from a dev perspective, you shouldn't care about tokens ever - if your inference framework isn't abstracting that for you, your first task would be to patch it to do so.

To parent, yes this is for local models, so insomuch worrying about token implies financial cost, yes

IanCal · 2025-06-10T11:54:37 1749556477

Ish - it always depends how deep in the weeds you need to get. Tokenisation impacts performance, both speed and results, so details can be important.

refulgentis · 2025-06-10T13:34:49 1749562489

I maintain a llama.cpp wrapper, on everything from web to Android and cannot quite wrap my mind around if you'd have any more info by getting individual token IDs from the API, beyond what you'd get from wall clock time and checking their vocab.

lqstuart · 2025-06-10T14:17:18 1749565038

I don’t really see a need for token IDs alone, but you absolutely need per-token logprob vectors if you’re trying to do constrained decoding

refulgentis · 2025-06-10T16:19:21 1749572361

Interesting point, my first reaction was "why do you need logprobs? We use constrained decoding for tool calls and don't need them"...which is actually false! Because we need to throw out those log probs then find the highest log prob of a token meeting the constraints.

lqstuart · 2025-06-11T00:13:08 1749600788

Haha yeah. I’ve seen you mention the llama cpp wrapper elsewhere, it sounds cool! I’ve worked enough with vLLM and sglang to get angry at xgrammar, which I believe has some common ancestry with the GGML stack (GBNF if I’m not mistaken, which I may be). The constrained decoding part is as simple as you’d expect, just applies a bitmask to the logprobs during the “logit processing” and continuing as normal.

IanCal · 2025-06-11T14:48:11 1749653291

Do we have the vocab? That's part of the point here. Does it take images? How are they tokenised?