Does anyone know of a scraper that uses LLMs/natural language to build a determi...

throwup238 · 2025-05-12T00:37:26 1747010246

llm-scraper [1] does a decent job but it's still a bit fragile. The biggest problem I have is all the React CSS-in-JS libraries that use hashes in their class names, which the LLM isn't smart enough to ignore.

[1] https://github.com/mishushakov/llm-scraper

cdolan · 2025-05-12T02:01:35 1747015295

What have you had success doing with this? Curious to test it

throwup238 · 2025-05-12T02:23:59 1747016639

I mostly use it to aggregate event calendars for all the concert/sport/etc venues, meetups, and clubs in my area and do some other scraping tasks. I host a little wrapper around llm-scraper on a DigitalOcean droplet that I call from Val.town scripts

I only check most places once a week so I use the LLM to do the scraping but there are a few cases where I have to scrape thousands of pages very frequently so I use the more deterministic script it generates instead.

dddw · 2025-05-12T19:39:51 1747078791

Oh Im interested in doing something similiar, is it hard to do?

cdolan · 2025-05-12T03:00:07 1747018807

Great thanks!

TheTaytay · 2025-05-12T01:21:59 1747012919

Nice! Thanks!

cdolan · 2025-05-12T01:59:51 1747015191

We’ve built one internally using browser-use to generate playwright code

Works ok. Not as automated as I’d like

nicman23 · 2025-05-12T06:02:00 1747029720

they are all quite bad