Graph theory and page rank are a good place to start. https://blogs.cornell.edu/...

Graph theory and page rank are a good place to start.

https://blogs.cornell.edu/info2040/2011/09/20/pagerank-backb...

Google's algorithm is indecipherably complex today but in the early days the way search engines more or less worked was they crawled the web and the way they decided to rank the pages was by how many other pages had a URL reference pointing to it.

You can apply this idea today in private (or I suppose public) search engines in the same way to interesting results.

For example a search engine for for scientific papers might use page rank to sort papers that are cited by the most other papers.

Or if you were going to make a search engine for open source projects you could create a page rank algorithm based on what projects had dependencies to other projects.

Part of why Google's algorithm today is more complex than this is that people try to game whatever algorithm search engines commonly use. You may remember back in the 90s and 2000s people would do stuff like put back links to other websites in the source to try to game page rank. Today that kind of behavior has expanded into a whole cottage industry (unfortunately).

What's interesting though is that for a lot of more limited data sets is you have less of that SEO type problem.

Whichever kind you're building good luck! Search engines usually wind up being petty cool (and profitable).