I adore Ripgrep and use it dozens of times a day, and have for years. It's extremely fast, does the right thing most of the time, and has a useful featureset.
Ack is also nice, I've used that quite a bit too. It has the advantage of being in Perl, so if you're on a "secure" computer (no compiler), you can still use a fast + featureful search tool.
I'm glad you appreciate that you can install ack anywhere. It's exactly for that use case that I have kept ack able to be a single text file download. Also, it only requires Perl 5.10.1, so it's OK if you're using an old Perl.
Hmm — there is a project that allows you to compile a single binary that is cross platform across Linux, Mac and Windows. Wondering if ripgrep can be compiled in such a way and that would make it very close to the portability you have with ack.
That still might run afoul of locked down networks like I've seen at banks. Users couldn't install any binaries at all, but with something like ack it's just cut & paste some text into a text file. ack can literally be a select-all, Ctrl-C, switch windows, "cat > ack", Ctrl-V.
I feel the same way about ripgrep, and also fzf - especially both in combination. I only started using them both a few months ago, yet it feels like a fundamental way of doing computing.
fzf is wonderful. I feel like we're only scratching the surface of its utility. Using it with git add is so fast, just "ga" (an alias), hit tab (for each file) then enter.
Figuring out how to integrate that with FZF would be really something. Being able to easily go up and down the list, and visualize the whole thing, would make things a lot smoother.
Some of these line items are really obtuse and in some cases just not right, "Don't search in binary files" and "Treat binary files as if they were text" for example caught my eye - GNU grep has the `--binary-files` option which supports both of these features. Others like "can pipe output to a pager" seem like a half-hearted attempt to give a +1 to a specific tool while ignoring that you can... pipe the output of any of them using... a pipe.
If you follow the "If you have updates to the chart, please submit as a GitHub issue." link, you can see that there are a few dozen open issues and the page was last updated 2 years ago.
Ag (the_silver_searcher) has performance closer to ripgrep and a similar feature set. But it’s tough to beat ripgrep both for performance and reliability. Rust is great in this application.
I occasionally find myself wanting to search a data stream using a large-ish set (a few tens or hundreds of thousands) of regexes. This is very slow with a backtracking engine like PCRE, but ought to be pretty fast with a DFA-based engine like re2.
So far, I have been unsuccessful in finding a grep replacement that can read patterns from a file, and which also uses a DFA engine. Does one exist? From the table, it looks like ripgrep might be suitable. Is it?
Precisely speaking, no, I don't know of any grep tools that use a DFA engine. However, both ripgrep and GNU grep use a hybrid NFA/DFA engine (also known as a "lazy DFA") for some subset of regexes. I'm not too familiar with all of GNU grep's strategies, but for ripgrep, it will fall back to an NFA engine. (And I don't mean Friedl's bastardization of the term "NFA engine.") For ripgrep, see the --dfa-size-limit flag to try to let it use the hybrid NFA/DFA engine for bigger regexes. Whether it helps or not depends on your situation.
Now, this will do much better than a backtracking engine, but if you get up into the tens of thousands or hundreds of thousands of regexes, it's going to get pretty painful. Finite automata just doesn't scale that well. At that point, you really start wanting a more specialized solution. Probably the best answer to that that I know of is Hyperscan. And you're in luck; someone maintains a fork of ripgrep with support for Hyperscan: https://sr.ht/~pierrenn/ripgrep/
(A special case is tens of thousands of literal patterns. ripgrep will notice that and should use Aho-Corasick. It doesn't help so much with search time since it's just a NFA or a DFA like with regexes, but the machine itself is constructed much more quickly.)
It sounds like either plain ripgrep, or ripgrep+hyperscan, is pretty much exactly what I'm looking for. Next time I have this problem, I'll certainly be reaching for it.
Too much of a weighty dependency and too much of a niche IMO. For example, the last time I tried to build Hyperscan, I failed and gave up after 15 minutes of trying.
I would have thought you just need to include the rust-hyperscan crate[1] which would take care of that for you (but that crate probably didn't exist when you looked at it). I don't have a sense on the impact it has on overall binary size.
I don't think the existence of a crate or not really impacts anything I said. More to the point, it would put a reliance on someone else to maintain a crate for critical functionality in ripgrep. (And if that fell through, I would invariably need to pick up that burden. Removing functionality is a lot harder than adding it.)
It makes a lot more sense to me for something like Hyperscan to be maintained out of tree. I did work with the patch author a bit, and in particular, made some changes to ripgrep to make maintaining such a fork easier: https://github.com/BurntSushi/ripgrep/issues/1488
Bottom line is, a lot of people think that adding a dependency has nearly zero cost. But it doesn't. Not by a long shot.
I've written this code (in C++) for an employer. RE2 scaled fine to hundreds of thousands of regexes. You'll want to use RE2::Set, which compiles multiple regexes into a single DFA, and probably the "Filter" functionality (whose name I don't precisely remember and am too lazy to look up) which uses an Aho-Corasick tree to subset the potential matches. One thing you'll have to watch out for is RE2's maximum DFA size; if compilation of your RE2::Set fails, just split your set of regexes in half and compile again.
You could probably do some fun optimizations by grouping the regexes which depend on the same literals into their own sets, but I never needed to.
This is basically what ripgrep will do for you automatically. (ripgrep uses Rust's regex engine, which is a descendant of RE2.) But when you get up into hundreds of thousands of regexes, the NFA (and the resulting DFA) get really big. And things generally don't scale that well. Here's a good example: http://web.archive.org/web/20210302010420/https://01.org/hyp...
The problem is that for a big enough NFA, you'll wind up spending most of your search doing powerset construction to build the DFA.
> One thing you'll have to watch out for is RE2's maximum DFA size
You can configure this in ripgrep with the --dfa-size-limit flag. (See also --regex-size-limit.)
I originally had it as a "phrasebook" of how to do the same thing in the different tools, but it was really ugly and took up a lot of horizontal space, and I figured it was more useful as a chart of yes/no. Also, there were cases where two tools had pretty much the same feature, but not exactly, so just listing flags didn't make sense.
I've still got a lot of the data of the switches in the JSON file that I build the chart from. https://github.com/beyondgrep/website/blob/dev/features.json If you've got ideas on how to bring back the phrasebook format, either integrated into this page, or as a separate standalone page, I'd love to hear them. Maybe the phrasebook isn't best done as a table like this, for example. Open a ticket in GitHub and let me know your thoughts.
I use ripgrep and more typically rigrep from within Emacs, thanks to "counsel-rg". I configured counsel-rg (as suggested) to not display very long matching lines (for Emacs doesn't like lines that are too long).
I wonder how long would the documentation for GNU grep continue to say this:
>PCRE support is here to stay, but consider this option experimental when combined with the -z (--null-data) option, and note that ‘grep -P’ may warn of unimplemented features.
I did come across a few issues mentioned with -z on unix.stackexchange a few years back but they have been fixed as far as I know.
"Print lines by number" is a vague thing to say, particularly since the comparison later includes, "Print specific lines by number".
However, the grep -Hn feature is described in this comparison as, "Prefix the line number to matching lines"
One thing that can help people to compare this sort of tool is to pair technical descriptions like command line parameters with the natural language explanation. If tool foo has a feature you're describing as "Prevent cheesecake" then I have no idea if my tool bar can do that, whereas if you say this is -Xqm then I can read the documentation and discover that I call this "disable refrigerated dessert" and it's -VQb so yes, my tool does this too.
I spent some time recently reading the proposals to fix/ extend C++ ranges P2214 - and because this general idea is very common they often discuss Haskell, Rust or even Python. If you're experienced in a language you already know whether it would spell something FlatMap, flat_map, or flatMap but you might not guess that C++ people would call your filter_map by the name transform_maybe, or as a C++ programmer who has barely dipped their toe in Haskell you wouldn't know that Haskell doesn't use the word "transform" in this context and without being told what it's called you won't find the relevant documentation let alone be able to try it for yourself and appreciate what it's for.
The "print lines by number" was there because earlier versions of ack had a `--line=N` feature, where you could say "ack --line=15-18" and print those four lines. I dropped it because it was hardly any better than using sed.
If you've got suggestions on improvements, please submit an issue. I'd love to hear them.
I agree. I use grep in conjunction with find and parallel to achieve a number of features not native to grep that are built-in to these other tools.
I prefer tools designed with does one thing well philosophy. It lets me scale my knowledge. I can solve many problems with find and parallel not supported by these grep clones.
ripgrep degrades just fine to a normal grep tool. And you can use it in `find` pipelines.
> I prefer tools designed with does one thing well
This is pretty unlikely. For example, you probably use a grep tool that will also do recursive directory traversal for you. It probably even has flags for defining filters on that traversal. Why use such a tool when `find` already does recursive directory traversal for you?
"grep | head" doesn't limit the length of output lines, but "grep | cut" would.
However, ack and ripgrep's default unpiped output is grouped by file, and if you pipe the output, it doesn't do the grouped output.
The idea of "supposed to be used" is also different for ack than it is for grep. ack is specifically less of a general-use tool than grep. It's meant for searching source code. This is also why I have never said that ack is a replacement for grep.
Ack is also nice, I've used that quite a bit too. It has the advantage of being in Perl, so if you're on a "secure" computer (no compiler), you can still use a fast + featureful search tool.