Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

From the article:

  Having the data aligned ensures faster access time when retrieving pages from disk.
Byte-level alignment cannot possibly have anything to do with retrieving pages from disk, simply because the unit of retrieval is the whole page. From the hardware/OS perspective, a page is just an opaque blob of bytes (comprised from one or more blocks on the physical drive).

Only after these bytes have reached RAM does the byte-level alignment play a role, because CPU works slower on misaligned data.

The article itself then goes on to illustrates the above (and seemingly contradict itself):

  SQLite does not pad or align columns within a row. Everything is tightly packed together using minimal space. Two consequences of this design:

  SQLite has to work harder (use more CPU cycles) to access data within a row once it has that row in memory.
  SQLite uses fewer bytes on disk, less memory, and spends less time moving content around because there are fewer bytes to move.


I think the idea is that padding results in lower storage efficiency, which means fewer rows per page and hence lower I/O throughput. By changing the column order you can reduce the amount of padding required.


Sure, having less padding increases I/O efficiency. I was just commenting on the author's apparent confusion as to why the padding is there in the first place.

Here is the full(er) quote:

  Postgres will happily add padding to the underlying data in order to make sure it is properly aligned at the physical layer. Having the data aligned ensures faster access time when retrieving pages from disk.
This might be misunderstood as "Postgres adds padding to speed-up disk I/O", which is the opposite of what actually happens. Padding slows-down I/O but speeds-up the CPU processing afterwards.

SQLite made the opposite tradeoff.


You are absolutely correct, the current wording causes confusion as to where the speed-up happens. Over the weekend I'll add a note and link to this thread, thanks for pointing that out.


The wording implies to me that Postgres is doing the padding for alignment to reduce IO costs which, as branko_d suggests, would do the opposite. You are reading it as the intervention of the DBA reordering columns to remove the padding will improve IO efficiency by fitting more rows into each page, which you are right would be beneficial in that way.

Postgres will be performing padding for alignment to improve processing speed once data is in local memory – CPUs are usually much faster at reading & writing aligned data⁰. This is trading off memory use and IO efficiency for CPU gain, which is the right optimisation if you assume that your core working set fits nicely into RAM and that your CPU(s) have large enough cache that you don't create the same problem there¹. Other DBs don't do this padding at all either because they didn't think of it or, more likely in the case of the big ones, because they are optimising more for IO than being concerned about CPU bottlenecks, or perhaps they natively rearrange the fields where it makes a difference instead of being beholden to the column ordering given by the user².

----

[0] in fact some architectures don't directly support unaligned access at all, though probably not any architectures Postgres supports

[1] causing extra cache evictions if cache segment width aligns badly with the padding such that less data fits in the available cache

[2] if the user needs to care about physical ordering like this, you have a leaky abstraction




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: