Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

To index the web, you generally do make a copy of it.

Google has a huge number of books scanned, too.



“Somewhere at Google there is a database containing 25 million books and nobody is allowed to read them.”

https://www.theatlantic.com/technology/archive/2017/04/the-t...



Yeah, I hadn't thought about their abandoned effort to scan every book and archived newspaper in the world in a while, but I bet they're regretting now that they didn't finish. A non-trivial amount of that physical media has been tossed or degraded by underfunded libraries since then. And it's more valuable to them now that it ever was.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: