Lots of good improvements, my favorites are Oauth, NOT NULL constraint with NOT VALID, uuidv7, RETURNING contains old/new. And I think the async IO will bring performance benefits, although maybe not so much immediately.
Besides DDL changes, pgstream can also do on-the-fly data anonymization and data masking. It can stream to other stores, like Elasticsearch, but the current main focus is on PG to PG replication with DDL changes and anonymization.
We've been using it in Xata to power our `xata clone` functionality, which creates a "staging replica" on the Xata platform, with anonymized data that closely resembles production. Then one gets fast copy-on-write branching from this anonymized staging replica. This is great for creating dev branches and ephemeral environments.
For a tooling solution for this problem, and many others, pgroll (https://github.com/xataio/pgroll) automates the steps from the blog post in a single higher-level operation. It can do things like adding a hidden column, backfill it with data, then adds the constraint, and only then expose it in the new schema.
That is correct, for non-volatile default values Postgres is quick, which means that it is generally a safe operation.
Also interesting, `now()` is non-volatile because it's defined as "start of the transaction". So if you add a column with `DEFAULT now()` all rows will get the same value. But `timeofday()` is not volatile, so `DEFAULT timeofday()` is going to lock the table for a long time. A bit of a subtle gotcha.
[author] My point was that using the default config for the other providers while you control yours is a weakness in the methodology, and might influence the results.
Note that when strictly limiting to 4 CPU it's still faster.
Another way to mitigate this is to make the agents always work only with a copy of the data that is anonymized. Assuming the anonymisation step removes / replaces all sensitive data, then whatever the AI agent does, it won't be disastrous.
The anonymization can be done by pgstream or pg_anonymizer. In combination with copy-on-write branching, you can create a safe environments on the fly for AI agents that get access to data relevant for production, but not quite production data.
On the Xata platform we actually do CoW snapshots and branching at the block device level, which works great.
However we are developing pgstream in order to bring in data and sync it from other Postgres providers. pgstream can also do anonymisation and in the future subsetting. Basically this means that no matter which Postgres service you are using (RDS, CloudSQL, etc) you can get still use Xata for staging and dev branches.