More

nick-keller · 2025-04-30T09:57:23 1746007043

Yes, it has a dark theme, maybe I should advertise it on the homepage.

What would make you want to move away from MQTTX?

nick-keller · 2025-03-05T07:23:38 1741159418

I tried to give a detailed comparison with PGSync in another comment. But in a nutshell, PG to Elastic is just one use-case for PG-Capture. The goal for PG-Capture is to be a schema-based Change-Data-Capture utility for Postgres that lets you integrate into your existing stack.

It has no opinion on what you should use to capture low-level events (Debezium, WAL-Listener...) and what you should do with the resulting high-level events (indexation, caching, event bus..).

I am pitching it as a PG to Elastic tool simply because it is a widespread use-case that everyone understands.

nick-keller · 2025-03-05T07:17:01 1741159021

I believe it is very comparable to PGSync in the way it works (schema-based CDC), the main differences are:

- PGSync is a full-stack solution and PG-Sync is "just" a library. PGSync will work out of the box while PG-Capture will require more setup but you'll get more flexibility

- PGSync does not let you choose where you get your data from, it handles everything for you. PG-Capture lets you source events from Debezium, WAL-Listener, PG directely...

- PGSync is only meant to move data from PG to Elastic or Open-Search. While this use-case is perfectly feasible with PG-Capture, you can use it for many more things: populating cache, indexing into Algolia, sending events to an event bus...

All in all, the main difference is that PG-Capture is agnostic of the stack you want as input and output, allowing you to do pretty much anything while PGSync is focused on indexing data from PG to Elastic. I hope that clears things up!

nick-keller · 2025-03-05T07:03:54 1741158234

Thanks for the extremely detailed feedback. I'll try to address your (very valid) concerns:

- I don't know if you had a look at the "How does it work?" page, here I try to explain using sequence diagrams how the process is split in two: first aggregating events into root IDs and then building the final objects from those root IDs.

- Each of those two steps hit the DB but: (i) it should not be the production database but a read-only replica, (ii) those two queries are independent and can be run separately. So instead of rebuilding extraction from scratch, I decided to rely on already existing replication strategies which in essence do exactly what you suggest.

- This library is not at all concerned about transformation, this step should indeed be separated. In our production environment, we transform the high-level events that PG-Capture sends with an async worker that does not hit the DB at all, it just transforms the data it receives.

- I agree that you should not index directly what is in the DB, which is why you should transform the data I suggest in my previous point. But that data has to come from somewhere, and PG-Sync aims at making that part of the process smooth and robust.

- Regarding full-indexation, it is actually pretty straightforward: push all IDs of your root table into the store (can be streamed) and your consumer should already be building objects and publishing high-level events. The good part is that the consumer will not do one query per object but can build a lot of objects at once with a single query.

We have been using PG-Capture in production for half a year so far, but we are not yet at the scale of a few Gb per second.

Eager to have your feedback regarding those points.

nick-keller · 2025-03-05T06:41:42 1741156902

Today I use PG-Sync to index things in MeiliSearch and to populate cache in production.

nick-keller · 2025-03-05T06:40:28 1741156828

Documentation is indeed very early, the main focus now was to pitch the idea correctly and help people understand what it does. I will then work on integrating multiple sources (Debezium, WAL-Listener...) and multiple destinations (Elastic, Redis...).

Good catch for the footer, thx!

nick-keller · 2025-03-04T19:45:28 1741117528

Good catch, I just added an MIT license.

LorenzoGood · 2025-03-05T17:46:45 1741196805

The package.json still says ISC.

nick-keller · 2025-03-04T18:50:22 1741114222

Happy if it helps! Feel free to share your feedback here or on GitHub once you do!

nick-keller · 2025-03-04T18:48:48 1741114128

It does if you are using PG (other SQL databases will be added later). Under the hood, PG-Capture listens to raw Postgres events, it does not matter if the data was updated via an ORM (like Prisma), raw SQL, or even a developer's IDE...

nick-keller · 2025-03-04T18:45:40 1741113940

Thanks! Syncing PG to another data store like Algolia or Elastic is just a very common use case that I use to pitch the idea. But Change-Data-Capture can be used for much more: emitting events when data changes, transforming data, caching data...

All of those use cases are really painful with raw table-level events.