Good question! The queries go directly to Twitter's API (via a third-party provider).
No local index or stored tweets—everything is fetched in real-time when you search or view a profile. We do cache responses temporarily to avoid hitting rate limits, but the cache expires quickly.
The benefit of this approach:
• Always up-to-date (real-time data)
• No storage costs or maintenance
• Simpler architecture
The downside:
• Dependent on API availability
• Rate limit considerations
I considered building a local index but decided against it for now—wanted to ship fast and keep it simple. Might revisit if the API approach becomes problematic.
I don’t know exactly what’s going on, but what you describe fits with my experience as well. This article describes events since 2015 that also fit with the same thing: https://archive.org/details/search-timeline
Google is supposed to have decent competition, but for some reason it doesn’t.
Yup, pretty cool right? This article mentions the GSA and talks about some other cool things in the search industry that also went missing around the same time: https://archive.org/details/search-timeline
I think "scraper vs siteowners" is a false dichotomy. Scrapers will always need to exist as long as we want search engines and archival services. We will need small versions of these services to keep popping up every now and then to keep the big guys on their toes, and the smaller guys need advice for scraping politely.
That's fair - though are we in an isolated bout of "every now and then" or has AI created a new normal of abuse (e.g. of robots.txt)? Hopefully we're at a local maximum and some of the scrapers perpetrating harmful behaviours will soon pull their heads in.
reply