> Scraping things that don't want to be scraped If all else fails, no website ca...

elorant · on Oct 11, 2021

Assuming that you eventually manage to load the page somehow. Which in some edge cases may entail simulating mouse movements and random delays.

eastendguy · on Oct 12, 2021

Agreed. -> I use the ui.vision extension to simulate native mouse movements.

timwis · on Oct 11, 2021

Have you tried on a page protected by cloudflare captcha?

1vuio0pswjnm7 · on Oct 11, 2021

Its funny I never seem to hit these infamous Clouflare captchas. The only impediment I encounter with Cloudflare is they require plaintext SNI to read their blog, https://blog.cloudflare.com. Unlike almost all other Cloudflare, ESNI will not work.

dec0dedab0de · on Oct 12, 2021

I have not had to deal with that, but I have idly thought that it might be easier to pipe the audio version into google assistant or something, and see what it comes up with.

eastendguy · on Oct 12, 2021

It seems to be no problem if you automate a real browser as opposed to a headless browser. I think they test for that.

mkl · on Oct 11, 2021

A browser extension is probably an easier way to extract text than OCR (unless you're targeting a wide range of sites, I suppose).