Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

if claude generates the tests, runs those tests, applies the fixes without any oversight, it is a very "who watches the watchmen" situation.


That is true, so don't give it entirely free reign with that. I let Claude generate as many additional tests as it'd like, but I either produce high level tests, or review a set generated by Claude first, before I let it fill in the blanks, and it's instructed very firmly to see a specific set of test cases as critical, and then increasingly "boxed in" with more validated test cases as we go along.

E.g. for my compiler, I had it build scaffolding to make it possible to run rubyspecs. Then I've had it systematically attack the crashes and failures mostly by itself once the test suite ran.


If you generate the tests, run those tests, apply fixes without any oversight, it is the very same situation. In reality, we have PR reviews.


Is it? Stuff like ripgrep, msmpt,… are very much one-man project. And most packages on distro are maintained by only one person. Expertise is a thing and getting reliable results is what differentiates expert from amateurs.


Gemini?


Good lord, that would be like the blind leading the daft.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: