we tried a few different variations but tbh had universally bad results. for example, we use `ward` test runner in our python codebase, and claude sonnet (both 3.7 and 4) keep trying to force-switch it to pytest lol. every. single. time.
maybe we could either try this with opus 4 and hope that cheaper models catch up, or just drink the kool-aid and switch to pytest...
maybe we could either try this with opus 4 and hope that cheaper models catch up, or just drink the kool-aid and switch to pytest...