The report talks at length about this, but maybe it bears rephrasing!
The ANSI definitions are bad: they allow multiple interpretations with varying results. 25 years ago, Berenson, O'Neil, et al. published a paper showing the ANSI definitions had this ambiguity, and that what the spec meant to define should have been a broader class of anomalies. They literally say that the broad interpretation is the "correct" one, and the research community basically went "oh, yeah, you're right". Adya followed up with generalized isolation level definitions, and pretty much every paper I've read has gone with these versions since. That didn't make its way back into the SQL spec though: it's still ambiguous, which means you can interpret RR as allowing G2-item.
Why prevent G2-item in RR? Because then the difference between repeatable read and serializable is specifically phantoms, rather than phantoms plus... some other hard-to-describe anomalies. If you use the broad/generalized interpretation, you can trust that a program which only accesses data by primary key, running under repeatable read, is actually serializable. That's a powerful, intuitive constraint. If you use the strict interpretation, RR allows Other Weird Behaviors, and it's harder to prove an execution is correct.
For a very thorough discussion of this, see either Berenson or Adya's papers, linked throughout the report.
Thanks, I got the part about the spec being ambiguous, am more interested in the "why" aspect, since the current behaviour seems intuitive to the name "repeatable read". But on closer inspection, I see PostgreSQL's repeatable read blocks phantom reads even though the ANSI spec permits that! I don't get why phantom reads would be acceptable under "repeatable read"... I probably should give those papers a read some time. But in the meantime, given the choice of phantom reads or G2-item, I think I'd pick blocking phantom reads. (It might be nice to have the option to choose though!)
In PostgreSQL's case, if they somehow made repeatable read to prevent G2-item without sacrificing the phantom reads, would that mean repeatable read is then "serializable" according to the ANSI definition?
But in the meantime, given the choice of phantom reads or G2-item, I think I'd pick blocking phantom reads.
Well... it's not quite so straightforward. SI still allows some phantoms. It only prohibits some of them.
In PostgreSQL's case, if they somehow made repeatable read to prevent G2-item without sacrificing the phantom reads, would that mean repeatable read is then "serializable" according to the ANSI definition?
I'm not quite sure I follow--If you're asking whether snapshot isolation (Postgres "Repeatable Read") plus preventing G2-item is serializable, the answer is no--that model would still allow G2 in general--specifically, cycles involving non-adjacent rw dependencies with predicates.
The ANSI definitions are bad: they allow multiple interpretations with varying results. 25 years ago, Berenson, O'Neil, et al. published a paper showing the ANSI definitions had this ambiguity, and that what the spec meant to define should have been a broader class of anomalies. They literally say that the broad interpretation is the "correct" one, and the research community basically went "oh, yeah, you're right". Adya followed up with generalized isolation level definitions, and pretty much every paper I've read has gone with these versions since. That didn't make its way back into the SQL spec though: it's still ambiguous, which means you can interpret RR as allowing G2-item.
Why prevent G2-item in RR? Because then the difference between repeatable read and serializable is specifically phantoms, rather than phantoms plus... some other hard-to-describe anomalies. If you use the broad/generalized interpretation, you can trust that a program which only accesses data by primary key, running under repeatable read, is actually serializable. That's a powerful, intuitive constraint. If you use the strict interpretation, RR allows Other Weird Behaviors, and it's harder to prove an execution is correct.
For a very thorough discussion of this, see either Berenson or Adya's papers, linked throughout the report.