Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

We need a crowd sourced list like adgaurd, but for bots. Id love to block all those ips at the firewall.


The only way you can block these "AI" scrapers is a combination of IP filtering (https://spur.us/) and Fingerprinting (https://abrahamjuliot.github.io/creepjs/).

Things like browserbase are easy to block with this. It's a losing battle though, personally moved entirely to real environments for https://browser.cash/developers


So that would be at least: GCP, Azure, Alibaba, AWS, Huawei, AT&T, BT, Cox... it's a long list.

User Agents then? No, because that would be: Chrome and Safari.

It's an uphill battle, because the bot authors do not give a shit. You can now buy bot network from actual companies, who embed proxies in free phone games. Anthropic was caught hiding behind Browserbase, and neither of the companies seems to see problem with that.


Block GCP, AWS, Azure, and various datacenter prefixen, and you're pretty much golden. There are scant few legitimate reasons a human being's traffic would originate from those hosts.


You can run virtual desktops in the cloud, like AWS's Workspaces, sold as a business rather than developer offering. AWS does publish the IP range those clients use, and I assume other similar offerings out there do the same.


I am working from a cloud desktop but I am only visiting corporate approved resources from that cloud desktop and I believe that is the case of most cloud desktop users as the whole point is to have a clear separation of duties.


Correct, but I don't think it's a safe assumption that approved resources wouldn't have a reason to block requests from the cloud.


I'm sure people who can afford to run virtual desktops in the cloud can also afford a phone/laptop/desktop to access sites that block those virtual desktops in the cloud.


I'm thinking more along the lines of people using virtual desktops assigned by their job, and those sites are part of their work. I don't feel like punting to BYOD is a good solution.



A large portion of those addresses will be valid residential IP addresses running malware on compromised Windows machines.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: