Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Feature comparison of ack, ag, Git-grep, grep and ripgrep (beyondgrep.com)
130 points by Amorymeltzer on Oct 24, 2021 | hide | past | favorite | 48 comments


I adore Ripgrep and use it dozens of times a day, and have for years. It's extremely fast, does the right thing most of the time, and has a useful featureset.

Ack is also nice, I've used that quite a bit too. It has the advantage of being in Perl, so if you're on a "secure" computer (no compiler), you can still use a fast + featureful search tool.


I'm glad you appreciate that you can install ack anywhere. It's exactly for that use case that I have kept ack able to be a single text file download. Also, it only requires Perl 5.10.1, so it's OK if you're using an old Perl.


Hmm — there is a project that allows you to compile a single binary that is cross platform across Linux, Mac and Windows. Wondering if ripgrep can be compiled in such a way and that would make it very close to the portability you have with ack.


That still might run afoul of locked down networks like I've seen at banks. Users couldn't install any binaries at all, but with something like ack it's just cut & paste some text into a text file. ack can literally be a select-all, Ctrl-C, switch windows, "cat > ack", Ctrl-V.


I feel the same way about ripgrep, and also fzf - especially both in combination. I only started using them both a few months ago, yet it feels like a fundamental way of doing computing.


fzf is wonderful. I feel like we're only scratching the surface of its utility. Using it with git add is so fast, just "ga" (an alias), hit tab (for each file) then enter.


I find interactive git add more useful most of the time, 'git add -i' then select what's needed from the menu


Figuring out how to integrate that with FZF would be really something. Being able to easily go up and down the list, and visualize the whole thing, would make things a lot smoother.


Any examples of using them in combination? I've only recently started using ripgrep despite being aware of it for a while.


There are a ton of articles out there about it:

https://duckduckgo.com/?q=ripgrep+%2B+fzf


Some of these line items are really obtuse and in some cases just not right, "Don't search in binary files" and "Treat binary files as if they were text" for example caught my eye - GNU grep has the `--binary-files` option which supports both of these features. Others like "can pipe output to a pager" seem like a half-hearted attempt to give a +1 to a specific tool while ignoring that you can... pipe the output of any of them using... a pipe.


If you follow the "If you have updates to the chart, please submit as a GitHub issue." link, you can see that there are a few dozen open issues and the page was last updated 2 years ago.


Ahhh, I did not (my bad) - I don't track grep features, not a hobby :) this chart could have been more truthy 2 years ago for all I know. Thanks.


If there are things you think are confusing or inaccurate, please do make GitHub issues for them. Thanks.


Maybe we need a (2019) in the title.


Or maybe I just need to move it up in my priority stack.


Here's another site of mine y'all may be interested in: https://altbox.dev/

It's a collection of improved shell tools, organized by the tool they supplement.

As with this feature comparison chart, patches and suggestions are welcome: https://github.com/petdance/altbox


You should submit that to HN!


When I need a faster grep, I love ugrep (https://github.com/Genivia/ugrep - especially when I am searching compressed logs for debugging).


What do you love about ugrep?


This is preferred over ripgrep? I've not used ugrep before


I would absolutely love to see examples of each tools' syntax for each use case.

In particular "Show proximity of matches to other matches" would be a huge boon to replace `grep -C 5 foo | grep bar`.


ripgrep supports the most common features while still being much faster than ack.

So it's no comparison for me.


Ag (the_silver_searcher) has performance closer to ripgrep and a similar feature set. But it’s tough to beat ripgrep both for performance and reliability. Rust is great in this application.


I occasionally find myself wanting to search a data stream using a large-ish set (a few tens or hundreds of thousands) of regexes. This is very slow with a backtracking engine like PCRE, but ought to be pretty fast with a DFA-based engine like re2.

So far, I have been unsuccessful in finding a grep replacement that can read patterns from a file, and which also uses a DFA engine. Does one exist? From the table, it looks like ripgrep might be suitable. Is it?


Precisely speaking, no, I don't know of any grep tools that use a DFA engine. However, both ripgrep and GNU grep use a hybrid NFA/DFA engine (also known as a "lazy DFA") for some subset of regexes. I'm not too familiar with all of GNU grep's strategies, but for ripgrep, it will fall back to an NFA engine. (And I don't mean Friedl's bastardization of the term "NFA engine.") For ripgrep, see the --dfa-size-limit flag to try to let it use the hybrid NFA/DFA engine for bigger regexes. Whether it helps or not depends on your situation.

Now, this will do much better than a backtracking engine, but if you get up into the tens of thousands or hundreds of thousands of regexes, it's going to get pretty painful. Finite automata just doesn't scale that well. At that point, you really start wanting a more specialized solution. Probably the best answer to that that I know of is Hyperscan. And you're in luck; someone maintains a fork of ripgrep with support for Hyperscan: https://sr.ht/~pierrenn/ripgrep/

(A special case is tens of thousands of literal patterns. ripgrep will notice that and should use Aho-Corasick. It doesn't help so much with search time since it's just a NFA or a DFA like with regexes, but the machine itself is constructed much more quickly.)


What an incredibly helpful reply. Thank you!

It sounds like either plain ripgrep, or ripgrep+hyperscan, is pretty much exactly what I'm looking for. Next time I have this problem, I'll certainly be reaching for it.


Out of curiosity, what prevents the hyperscan support from being mainlined?


Too much of a weighty dependency and too much of a niche IMO. For example, the last time I tried to build Hyperscan, I failed and gave up after 15 minutes of trying.


I would have thought you just need to include the rust-hyperscan crate[1] which would take care of that for you (but that crate probably didn't exist when you looked at it). I don't have a sense on the impact it has on overall binary size.

[1] https://crates.io/crates/hyperscan


I don't think the existence of a crate or not really impacts anything I said. More to the point, it would put a reliance on someone else to maintain a crate for critical functionality in ripgrep. (And if that fell through, I would invariably need to pick up that burden. Removing functionality is a lot harder than adding it.)

It makes a lot more sense to me for something like Hyperscan to be maintained out of tree. I did work with the patch author a bit, and in particular, made some changes to ripgrep to make maintaining such a fork easier: https://github.com/BurntSushi/ripgrep/issues/1488

Bottom line is, a lot of people think that adding a dependency has nearly zero cost. But it doesn't. Not by a long shot.


I've written this code (in C++) for an employer. RE2 scaled fine to hundreds of thousands of regexes. You'll want to use RE2::Set, which compiles multiple regexes into a single DFA, and probably the "Filter" functionality (whose name I don't precisely remember and am too lazy to look up) which uses an Aho-Corasick tree to subset the potential matches. One thing you'll have to watch out for is RE2's maximum DFA size; if compilation of your RE2::Set fails, just split your set of regexes in half and compile again.

You could probably do some fun optimizations by grouping the regexes which depend on the same literals into their own sets, but I never needed to.


This is basically what ripgrep will do for you automatically. (ripgrep uses Rust's regex engine, which is a descendant of RE2.) But when you get up into hundreds of thousands of regexes, the NFA (and the resulting DFA) get really big. And things generally don't scale that well. Here's a good example: http://web.archive.org/web/20210302010420/https://01.org/hyp...

The problem is that for a big enough NFA, you'll wind up spending most of your search doing powerset construction to build the DFA.

> One thing you'll have to watch out for is RE2's maximum DFA size

You can configure this in ripgrep with the --dfa-size-limit flag. (See also --regex-size-limit.)


Nice, would be better if there are examples of each feature to show how their cli flags map to others


That's how I had it at first.

I originally had it as a "phrasebook" of how to do the same thing in the different tools, but it was really ugly and took up a lot of horizontal space, and I figured it was more useful as a chart of yes/no. Also, there were cases where two tools had pretty much the same feature, but not exactly, so just listing flags didn't make sense.

I've still got a lot of the data of the switches in the JSON file that I build the chart from. https://github.com/beyondgrep/website/blob/dev/features.json If you've got ideas on how to bring back the phrasebook format, either integrated into this page, or as a separate standalone page, I'd love to hear them. Maybe the phrasebook isn't best done as a table like this, for example. Open a ticket in GitHub and let me know your thoughts.


I use ripgrep and more typically rigrep from within Emacs, thanks to "counsel-rg". I configured counsel-rg (as suggested) to not display very long matching lines (for Emacs doesn't like lines that are too long).

It is really very fast.


Doom emacs uses it if it’s installed


The next step is _Extended_ regular expressions that characterise Regular Relations (RR) and that specify Finite Strate Transducers (FSTs). See:

http://users.itk.ppke.hu/~sikbo/nytech/gyak/05_morfo/xfst/bo...

Advances:

- symmetry input:output (reversable)

- readable/maintainable expressions due to _naming_ of sub-expression

Implementations:

- Xerox XRCE XFST/lexc/twolc compilers

- FOMA - https://fomafst.github.io/


I wonder how long would the documentation for GNU grep continue to say this:

>PCRE support is here to stay, but consider this option experimental when combined with the -z (--null-data) option, and note that ‘grep -P’ may warn of unimplemented features.

I did come across a few issues mentioned with -z on unix.stackexchange a few years back but they have been fixed as far as I know.


Apparently,

> Print lines by number

is not supported by GNU grep???

  $ grep --version
  grep (GNU grep) 2.25
  $ cat world
  hello
  $ grep -Hn hello world
  world:1:hello


"Print lines by number" is a vague thing to say, particularly since the comparison later includes, "Print specific lines by number".

However, the grep -Hn feature is described in this comparison as, "Prefix the line number to matching lines"

One thing that can help people to compare this sort of tool is to pair technical descriptions like command line parameters with the natural language explanation. If tool foo has a feature you're describing as "Prevent cheesecake" then I have no idea if my tool bar can do that, whereas if you say this is -Xqm then I can read the documentation and discover that I call this "disable refrigerated dessert" and it's -VQb so yes, my tool does this too.

I spent some time recently reading the proposals to fix/ extend C++ ranges P2214 - and because this general idea is very common they often discuss Haskell, Rust or even Python. If you're experienced in a language you already know whether it would spell something FlatMap, flat_map, or flatMap but you might not guess that C++ people would call your filter_map by the name transform_maybe, or as a C++ programmer who has barely dipped their toe in Haskell you wouldn't know that Haskell doesn't use the word "transform" in this context and without being told what it's called you won't find the relevant documentation let alone be able to try it for yourself and appreciate what it's for.


The "print lines by number" was there because earlier versions of ack had a `--line=N` feature, where you could say "ack --line=15-18" and print those four lines. I dropped it because it was hardly any better than using sed.

If you've got suggestions on improvements, please submit an issue. I'd love to hear them.

https://github.com/beyondgrep/website/issues


Or "Limit length of output lines"... how about

   $ grep ... | head -n ...
I think this feature comparison misses how grep is supposed to be used. (See also "Pipe output through a pager or other command")


I agree. I use grep in conjunction with find and parallel to achieve a number of features not native to grep that are built-in to these other tools.

I prefer tools designed with does one thing well philosophy. It lets me scale my knowledge. I can solve many problems with find and parallel not supported by these grep clones.


ripgrep degrades just fine to a normal grep tool. And you can use it in `find` pipelines.

> I prefer tools designed with does one thing well

This is pretty unlikely. For example, you probably use a grep tool that will also do recursive directory traversal for you. It probably even has flags for defining filters on that traversal. Why use such a tool when `find` already does recursive directory traversal for you?


"grep | head" doesn't limit the length of output lines, but "grep | cut" would.

However, ack and ripgrep's default unpiped output is grouped by file, and if you pipe the output, it doesn't do the grouped output.

The idea of "supposed to be used" is also different for ack than it is for grep. ack is specifically less of a general-use tool than grep. It's meant for searching source code. This is also why I have never said that ack is a replacement for grep.


that limits the number of output lines, not the length.


You can update the page with pull requests to this repo: https://github.com/beyondgrep/website




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: