Hacker Newsnew | past | comments | ask | show | jobs | submit | nick-keller's commentslogin

Yes, it has a dark theme, maybe I should advertise it on the homepage.

What would make you want to move away from MQTTX?


I tried to give a detailed comparison with PGSync in another comment. But in a nutshell, PG to Elastic is just one use-case for PG-Capture. The goal for PG-Capture is to be a schema-based Change-Data-Capture utility for Postgres that lets you integrate into your existing stack.

It has no opinion on what you should use to capture low-level events (Debezium, WAL-Listener...) and what you should do with the resulting high-level events (indexation, caching, event bus..).

I am pitching it as a PG to Elastic tool simply because it is a widespread use-case that everyone understands.


I believe it is very comparable to PGSync in the way it works (schema-based CDC), the main differences are:

- PGSync is a full-stack solution and PG-Sync is "just" a library. PGSync will work out of the box while PG-Capture will require more setup but you'll get more flexibility

- PGSync does not let you choose where you get your data from, it handles everything for you. PG-Capture lets you source events from Debezium, WAL-Listener, PG directely...

- PGSync is only meant to move data from PG to Elastic or Open-Search. While this use-case is perfectly feasible with PG-Capture, you can use it for many more things: populating cache, indexing into Algolia, sending events to an event bus...

All in all, the main difference is that PG-Capture is agnostic of the stack you want as input and output, allowing you to do pretty much anything while PGSync is focused on indexing data from PG to Elastic. I hope that clears things up!


Thanks for the extremely detailed feedback. I'll try to address your (very valid) concerns:

- I don't know if you had a look at the "How does it work?" page, here I try to explain using sequence diagrams how the process is split in two: first aggregating events into root IDs and then building the final objects from those root IDs.

- Each of those two steps hit the DB but: (i) it should not be the production database but a read-only replica, (ii) those two queries are independent and can be run separately. So instead of rebuilding extraction from scratch, I decided to rely on already existing replication strategies which in essence do exactly what you suggest.

- This library is not at all concerned about transformation, this step should indeed be separated. In our production environment, we transform the high-level events that PG-Capture sends with an async worker that does not hit the DB at all, it just transforms the data it receives.

- I agree that you should not index directly what is in the DB, which is why you should transform the data I suggest in my previous point. But that data has to come from somewhere, and PG-Sync aims at making that part of the process smooth and robust.

- Regarding full-indexation, it is actually pretty straightforward: push all IDs of your root table into the store (can be streamed) and your consumer should already be building objects and publishing high-level events. The good part is that the consumer will not do one query per object but can build a lot of objects at once with a single query.

We have been using PG-Capture in production for half a year so far, but we are not yet at the scale of a few Gb per second.

Eager to have your feedback regarding those points.


Today I use PG-Sync to index things in MeiliSearch and to populate cache in production.


Documentation is indeed very early, the main focus now was to pitch the idea correctly and help people understand what it does. I will then work on integrating multiple sources (Debezium, WAL-Listener...) and multiple destinations (Elastic, Redis...).

Good catch for the footer, thx!


Good catch, I just added an MIT license.


The package.json still says ISC.


Happy if it helps! Feel free to share your feedback here or on GitHub once you do!


It does if you are using PG (other SQL databases will be added later). Under the hood, PG-Capture listens to raw Postgres events, it does not matter if the data was updated via an ORM (like Prisma), raw SQL, or even a developer's IDE...


Thanks! Syncing PG to another data store like Algolia or Elastic is just a very common use case that I use to pitch the idea. But Change-Data-Capture can be used for much more: emitting events when data changes, transforming data, caching data...

All of those use cases are really painful with raw table-level events.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: