Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

What kind of stuff are people needing to scrape?


I would expect it's roughly the same answers, just varying in the specifics:

* those which don't offer a _reasonable_ API, or (I would guess a larger subset) those which don't expose all the same information over their API

* those things which one wishes to preserve (yes, I'm aware that submitting them to the Internet Archive might achieve that goal)

* and then the subset of projects where it's just a fun challenge or the ubiquitous $other

As an example answer to your question, some sites are even offering bounties for scraped data, so one could scratch a technical itch and help data science at the same time:

https://www.dolthub.com/repositories/pdap/datasets/bounties


I have a side-project where I display the schedule of the day of 100+ French radios, like you would for TV channels.

Scraping works great to get the data.

I don't like node/js but I use it to do the scraping as I view the code as trash and full of edge cases and unreliable data / types and I can't complain, a dynamic scripting language is great for that.


I scrape multiple government sites to fill all the data for https://www.quienmerepresenta.com.mx/

It tells you who is your governor, local/federal representative, senator and municipal president. Each representative lives on a different website so I wrote scrappers for each one.


Scraping saved untold lives this past spring when large healthcare providers (i.e. Walgreens & CVS) opted to hide their vaccination appointments behind redundant survey questions. This made it more difficult to quickly ascertain when an appointment slot would become available. The elderly were less likely to look more than once a day, delaying vaccines for those that needed it the most.

GoodRX built a scraping system that tapped into all the major providers. Thats what a group of vaccine hunters in my state used to get appointments for folks that had tried but were unable to.


Building a side project using python scrapy to scrape podcast shows. I use it to search by title/description etc to find interesting podcasts. Also as a way to learn different tools and frameworks.


Websites which change over time and don’t provide a simpler way of getting an update (e.g. an RSS feed or a JSON api).





Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: