Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Tell that to Googlebot, Bingbot, Petalbot, SemrushBot, MJ12bot, MojeekBot, DotBot, YandexBot, SeznamBot, Barkrowler, AhrefsBot, DuckDuckBot, AcademicBotRTU, Bytespider, Applebot, ZoominfoBot, TelegramBot, TwitterBot, SemanticScholarBot, redditbot, Pinterestbot... From a quick peek at my access log, all include either a link (most) or an email address (zoom, tiktok/bytedance, dotbot, and that academic bot)

Very few individual bots don't follow this good practice. Most of the IP ranges of violating bots are owned by Huawei (a few is Huawei Cloud so it could be anyone, but the majority seems to be Huawei themselves) and the remainder is all small beans as far as I remember (few thousand accesses in a day and then disappear forever, for example)



None of the institutional market intelligence products I’ve ever worked on in nearly a decade of doing this do. Why? Cause they wouldn’t otherwise work.

Many APIs require specific user agents. Tools like curl impersonate require specific user agents.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: