Hi HN! I'm building Agora, an AI search engine for e-commerce that returns results in under 300ms. We've indexed 30M products from 100k stores and made them easy to purchase using AI agents.
After launching here on HN, a large enterprise reached out to pay for access to the raw data. We serviced the contract manually to learn the exact workflow and then decided to productize the "Data Connector" to help us scale to more customers.
The Data Connector enables developers to select any of our 100k stores in the index, view sample data, format the output, and export the up-to-date data. Data can be exported as CSV or JSON.
We've built crawlers for Shopify, WooCommerce, Squarespace, Wix, and custom built stores to index the store information, product data, stock, reviews, and more. The primary technical challenge is to recrawl the entire dataset every 24 hours. We do this with a series of servers that "recrawl" different store-types with rotating local proxies and then add changes to a queue to be updated in our search index. Our primary database is Mongo and our search runs on self-hosted Meilisearch on high RAM servers.
My vision is to index the world's e-commerce data. I believe this will create market efficiencies for customers, developers, and merchants.
I'd love your feedback!
(1) What are the use cases you envision? I can see the value for a really large marketplace in having a ton of pricing data, or the value to a hedge fund etc in having raw data to analyze macro trends... what is the use case for someone paying $200/month for the developer tier? (If I'm a retailer myself I probably only need data on my direct competitors, unless there's something cool you're imagining that I've failed to see.)
(2) You've got some logos on the store splash that don't show up in store search (eg Nike). Is that a data error or a coding error?
(3) You should probably think about how you scrape and categorize marketplace data... the Walmart tab has a lot of products that are clearly third-party sellers transacting via walmart.com, which pollutes quite a bit of the data value if I primarily want to know what a big retailer is doing on products where they actually set the prices.
(4) Have you looked at grocery data? Have wished someone would build a grocery prices API for like a decade now... lots of cool consumer and hedge-fund monetization opportunities if you can show the price of strawberries in every store across the US (and graph the trendlines over time).