From the article: Having the data aligned ensures faster access time when retrie...

winternewt · on Oct 8, 2024

I think the idea is that padding results in lower storage efficiency, which means fewer rows per page and hence lower I/O throughput. By changing the column order you can reduce the amount of padding required.

branko_d · on Oct 8, 2024

Sure, having less padding increases I/O efficiency. I was just commenting on the author's apparent confusion as to why the padding is there in the first place.

Here is the full(er) quote:

  Postgres will happily add padding to the underlying data in order to make sure it is properly aligned at the physical layer. Having the data aligned ensures faster access time when retrieving pages from disk.

This might be misunderstood as "Postgres adds padding to speed-up disk I/O", which is the opposite of what actually happens. Padding slows-down I/O but speeds-up the CPU processing afterwards.

SQLite made the opposite tradeoff.

napsterbr · on Oct 8, 2024

You are absolutely correct, the current wording causes confusion as to where the speed-up happens. Over the weekend I'll add a note and link to this thread, thanks for pointing that out.

dspillett · on Oct 8, 2024

The wording implies to me that Postgres is doing the padding for alignment to reduce IO costs which, as branko_d suggests, would do the opposite. You are reading it as the intervention of the DBA reordering columns to remove the padding will improve IO efficiency by fitting more rows into each page, which you are right would be beneficial in that way.

Postgres will be performing padding for alignment to improve processing speed once data is in local memory – CPUs are usually much faster at reading & writing aligned data⁰. This is trading off memory use and IO efficiency for CPU gain, which is the right optimisation if you assume that your core working set fits nicely into RAM and that your CPU(s) have large enough cache that you don't create the same problem there¹. Other DBs don't do this padding at all either because they didn't think of it or, more likely in the case of the big ones, because they are optimising more for IO than being concerned about CPU bottlenecks, or perhaps they natively rearrange the fields where it makes a difference instead of being beholden to the column ordering given by the user².

----

[0] in fact some architectures don't directly support unaligned access at all, though probably not any architectures Postgres supports

[1] causing extra cache evictions if cache segment width aligns badly with the padding such that less data fits in the available cache

[2] if the user needs to care about physical ordering like this, you have a leaky abstraction