if claude generates the tests, runs those tests, applies the fixes without any o...

vidarh · 2025-10-14T12:51:18 1760446278

That is true, so don't give it entirely free reign with that. I let Claude generate as many additional tests as it'd like, but I either produce high level tests, or review a set generated by Claude first, before I let it fill in the blanks, and it's instructed very firmly to see a specific set of test cases as critical, and then increasingly "boxed in" with more validated test cases as we go along.

E.g. for my compiler, I had it build scaffolding to make it possible to run rubyspecs. Then I've had it systematically attack the crashes and failures mostly by itself once the test suite ran.

ErikBjare · 2025-10-14T09:23:14 1760433794

If you generate the tests, run those tests, apply fixes without any oversight, it is the very same situation. In reality, we have PR reviews.

skydhash · 2025-10-14T11:47:05 1760442425

Is it? Stuff like ripgrep, msmpt,… are very much one-man project. And most packages on distro are maintained by only one person. Expertise is a thing and getting reliable results is what differentiates expert from amateurs.

fragmede · 2025-10-14T09:13:21 1760433201

Gemini?

gmb_uk · 2025-10-14T09:27:08 1760434028

Good lord, that would be like the blind leading the daft.