Hacker Newsnew | past | comments | ask | show | jobs | submit | lefttoreader's commentslogin

Wait. So why not just contribute to an existing open source project if you’re going to implement an identical API?

If you run your own models as a part of it, surely you could hook up your models as a backend to whatever abstractions you’re copying here.


Wait, someone made a similar comment as this elsewhere in the thread. So why don't you just upvote that?

If you have your own thoughts, surely you could just think them to yourself while upvoting.


yikes.

I was responding to them sidestepping the first commenter’s question.


yikes.


The “trick” seems to blatantly rip off FlashText without citing it?

https://arxiv.org/pdf/1711.00046.pdf

I’m a fan of the approach. I normally wouldn’t care if this was just another LLM library taking inspiration, but if you’re going to go out of your way to put a paper on the ArXiv, feels like doing a literature review is a good step?


Care to explain how a string replacement algorithm relates to nudging the logits of a ML model?

I don't see the "rip off", the paper you cite requires a complete document to work on while this work is for guiding the generation of tokens


Both papers use the phrase "regular expressions" and there the resemblance ends. The linked manuscript uses regular expression to realize a grammar and then memoizes logic masks. I want to know why FlashText failed to cite:

Baeza-Yates, Ricardo A., and Gaston H. Gonnet. "Fast text searching for regular expressions or automaton searching on tries." Journal of the ACM (JACM) 43.6 (1996): 915-936.

Eltabakh, Mohamed Y., Ramy Eltarras, and Walid G. Aref. "To trie or not to trie? realizing space-partitioning trees inside postgresql: Challenges, experiences and performance." (2005).

Zhang, Yijun, and Lizhen Xu. "An algorithm for url routing based on trie structure." 2015 12th Web Information System and Application Conference (WISA). IEEE, 2015.


Your comment here doesn’t feel like it’s in good faith, but there’s a good chance I’m misreading it.


I'm serious that the similarities between the papers are superficial.

I don't think it's fair of you to criticize the authors for not citing some obscure preprint, when that manuscript itself neglected to cite decades of prior, relevant work.


I have some other comment on this thread where I point out why I don’t think it’s superficial. Would love to get your feedback on that if you feel like spending more time on this thread.

But it’s not obscure? FlashText was a somewhat popular paper at the time (2017) with a popular repo (https://github.com/vi3k6i5/flashtext). Their paper was pretty derivative of Aho-Corasick, which they cited. If you think they genuinely fucked up, leave an issue on their repo (I’m, maybe to your surprise lol, not the author).

Anyway, I’m not a fan of the whatabboutery here. I don’t think OG’s paper is up to snuff on its lit review - do you?


> I don’t think OG’s paper is up to snuff on its lit review - do you?

Not in the slightest. Caching the logit masks and applying the right one based on where you are in your grammar is obvious. This is what I'd expect some bright undergrads to come up with for a class project. This manuscript could've been a blog post.

Although arXiv is displacing some traditional publishing, I think it's a little silly to try to hold it to the same standards.

I saw your argument for why you think it's relevant and I think you're overstating the case. There are a _heap_ of papers they could've cited.

As an aside, when can we stop citing _Attention is All You Need_?


Sure! So it’s hopefully clear that the notion of constrained grammar is not novel (see every comment on here of people name-dropping their implementation from two months ago).

The novelty here is “instead of checking whether every token is allowed” to create a finite state machine that defines which tokens are allowable at each generation step. This lets them not check every token at every step.

The trick of creating an FSM to efficiently check next-token grammar is what allowed FlashText to run circles around standard regex stuff. Even FlashText guy acknowledged the shoulders he stood on, etc.

Let’s be super clear here, none of these standards apply when you’re building good ole libraries. But putting out a paper really elevates what you’re on the hook for. Most folks that write papers are dying to acknowledge the shoulders they stand on - it’s part of the toxic humility we all engage in.

Again - shill OSS all day - I’ll upvote it.


By "standard regex" stuff I take it you mean the standard regex stuff Python standard library comes with?

I mean going from standard regex to NFA to DFA is already more sophisticated than that one, it's _quite_ oldschool and gives you linear time matching: https://en.wikipedia.org/wiki/Thompson%27s_construction https://en.wikipedia.org/wiki/Powerset_construction

And what I mean to say by this as they could have easily have had this idea and never had discovered the whitepaper you referenced.


Yep! But that’s sort of my point, and maybe this is just some misplaced academic shit of mine but if you’re going to write a paper then “easily had this idea and never discovered the paper” just doesn’t fly.

Almost all academic work is derivative tweaks of yesterday’s work, yet we still fall over ourselves to cite this stuff.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: