I'm still pretty early in terms of figuring out if this approach is better for larger projects. I can confidently say it works great on small things - for small changes on small files the AST approach works pretty well. You can say things like "add a click listener to the button that calls a function to tally the user's score" to a simple game and it will do it. That is, less than a minute after typing the prompt, you will see the code update in the editor and see your preview-webview update with your changes applied.
However, I have noticed that the AST code quality heavily depends on how common it is in the training set. I think I will have to add documentation to it through RAG or something - because OpenAI's models that I'm using seem to have limited experience writing esprima for JavaScript for example.
So it's hit or miss. In some cases I do feel like I'm throwing stupid compute at solving small problems and it's unnecessary - however, as I work on the project, it is getting better and better at successfully making the modifications. Some of that is me improving the prompts, some of it is OpenAI improving the models themselves, and some of it is the infrastructure I'm building for the project itself.
I did notice a huge improvement when o1-mini released. It is dramatically better at writing the AST code than GPT-4o or 4o-mini. I haven't tried Claude 3.5 yet but I've been hearing it does an exceptional job at code writing - not sure about my AST requirements though!
My initial goal is to let users make webapp prototypes and iterate on them by writing tickets for the AI to complete.
I for some reason call it Code+=AI: https://codeplusequalsai.com