I'm getting LLMs to *modify* your source code by writing code that modifies the ...

magicalhippo · on Sept 29, 2024

I've just played with local models mostly, but I've found it difficult to get them to follow every part of the instruction.

For example they're oh-so keen on doing math themselves, even though they're shit at it and I instructed them not to do any math.

It's also hit and miss if they implement the right method or not, even with low temperature.

In my case I was experimenting with translating simple word problems into matlab scripts, so a resultcould be computed.

Do you find the AST approach helps? Or is it mostly just throwing compute at it, ie larger more better?

cryptoz · on Sept 29, 2024

I'm still pretty early in terms of figuring out if this approach is better for larger projects. I can confidently say it works great on small things - for small changes on small files the AST approach works pretty well. You can say things like "add a click listener to the button that calls a function to tally the user's score" to a simple game and it will do it. That is, less than a minute after typing the prompt, you will see the code update in the editor and see your preview-webview update with your changes applied.

However, I have noticed that the AST code quality heavily depends on how common it is in the training set. I think I will have to add documentation to it through RAG or something - because OpenAI's models that I'm using seem to have limited experience writing esprima for JavaScript for example.

So it's hit or miss. In some cases I do feel like I'm throwing stupid compute at solving small problems and it's unnecessary - however, as I work on the project, it is getting better and better at successfully making the modifications. Some of that is me improving the prompts, some of it is OpenAI improving the models themselves, and some of it is the infrastructure I'm building for the project itself.

I did notice a huge improvement when o1-mini released. It is dramatically better at writing the AST code than GPT-4o or 4o-mini. I haven't tried Claude 3.5 yet but I've been hearing it does an exceptional job at code writing - not sure about my AST requirements though!