Did it have near 100% coverage or did it make tradeoffs on what to test?
Did it include UI tests which are notoriously difficult, and if so how did it handle issues like timeouts and the async nature of UI?
Did it have rigid separation of concepts between unit vs integration tests etc, or more fluid?
Could you refactor internal code without changing tests — the holy grail.
(This is not feasible on every project but it was on this one, database interactions were simple)
There were a small number (~5%) of slow tests that used a real LLM, database, infrastructure, etc. and a small number of very low level unit tests (~5%) surrounding only complex stateless functions with simple interfaces.
Refactoring could be done trivially without changing any test code 98% of the time.
Additionally, the (YAML) tests could rewrite their expected responses based upon the actual outcome - e.g. when you added a new property to a rest api response you just reran the test in update mode and eyeballed the test.
There was also a template used to generate how-to markdown docs from the YAML.
Test coverage was probably 100% but I never measured it. All new features being written with TDD/documentation driven development probably guaranteed it.