I’m a fan of the approach. I normally wouldn’t care if this was just another LLM library taking inspiration, but if you’re going to go out of your way to put a paper on the ArXiv, feels like doing a literature review is a good step?
Both papers use the phrase "regular expressions" and there the resemblance ends. The linked manuscript uses regular expression to realize a grammar and then memoizes logic masks. I want to know why FlashText failed to cite:
Baeza-Yates, Ricardo A., and Gaston H. Gonnet. "Fast text searching for regular expressions or automaton searching on tries." Journal of the ACM (JACM) 43.6 (1996): 915-936.
Eltabakh, Mohamed Y., Ramy Eltarras, and Walid G. Aref. "To trie or not to trie? realizing space-partitioning trees inside postgresql: Challenges, experiences and performance." (2005).
Zhang, Yijun, and Lizhen Xu. "An algorithm for url routing based on trie structure." 2015 12th Web Information System and Application Conference (WISA). IEEE, 2015.
I'm serious that the similarities between the papers are superficial.
I don't think it's fair of you to criticize the authors for not citing some obscure preprint, when that manuscript itself neglected to cite decades of prior, relevant work.
I have some other comment on this thread where I point out why I don’t think it’s superficial. Would love to get your feedback on that if you feel like spending more time on this thread.
But it’s not obscure? FlashText was a somewhat popular paper at the time (2017) with a popular repo (https://github.com/vi3k6i5/flashtext). Their paper was pretty derivative of Aho-Corasick, which they cited. If you think they genuinely fucked up, leave an issue on their repo (I’m, maybe to your surprise lol, not the author).
Anyway, I’m not a fan of the whatabboutery here. I don’t think OG’s paper is up to snuff on its lit review - do you?
> I don’t think OG’s paper is up to snuff on its lit review - do you?
Not in the slightest. Caching the logit masks and applying the right one based on where you are in your grammar is obvious. This is what I'd expect some bright undergrads to come up with for a class project. This manuscript could've been a blog post.
Although arXiv is displacing some traditional publishing, I think it's a little silly to try to hold it to the same standards.
I saw your argument for why you think it's relevant and I think you're overstating the case. There are a _heap_ of papers they could've cited.
As an aside, when can we stop citing _Attention is All You Need_?
Sure! So it’s hopefully clear that the notion of constrained grammar is not novel (see every comment on here of people name-dropping their implementation from two months ago).
The novelty here is “instead of checking whether every token is allowed” to create a finite state machine that defines which tokens are allowable at each generation step. This lets them not check every token at every step.
The trick of creating an FSM to efficiently check next-token grammar is what allowed FlashText to run circles around standard regex stuff. Even FlashText guy acknowledged the shoulders he stood on, etc.
Let’s be super clear here, none of these standards apply when you’re building good ole libraries. But putting out a paper really elevates what you’re on the hook for. Most folks that write papers are dying to acknowledge the shoulders they stand on - it’s part of the toxic humility we all engage in.
Yep! But that’s sort of my point, and maybe this is just some misplaced academic shit of mine but if you’re going to write a paper then “easily had this idea and never discovered the paper” just doesn’t fly.
Almost all academic work is derivative tweaks of yesterday’s work, yet we still fall over ourselves to cite this stuff.
If you run your own models as a part of it, surely you could hook up your models as a backend to whatever abstractions you’re copying here.