First, skilled engineers using LLMs to code also think and discuss and stare off into space before the source code starts getting laid down. In fact: I do a lot, lot more thinking and balancing different designs and getting a macro sense of where I'm going, because that's usually what it takes to get an LLM agent to build something decent. But now that pondering and planning gets recorded and distilled into a design document, something I definitely didn't have the discipline to deliver dependably before LLM agents.
Most of my initial prompts to agents start with "DO NOT WRITE ANY CODE YET."
Second, this idea that LLMs are like junior developers that can't learn anything. First, no they're not. Early-career developers are human beings. LLMs are tools. But the more general argument here is that there's compounding value to working with an early-career developer and there isn't with an LLM. That seems false: the LLM may not be learning anything, but I am. I use these tools much more effectively now than I did 3 months ago. I think we're in the very early stages of figuring how to get good product out of them. That's obvious compounding value.
Regardless of that, personally i'd really like it if they could actually learn from interacting with them. From a user's perspective what i'd like to do is to be able to "save" the discussion/session/chat/whatever, with everything the LLM learned so far, to a file. Then later be able to restore it and have the LLM "relearn" whatever is in it. Now, you can already do this with various frontend UIs, but the important part in what i'd want is that a) this "relearn" should not affect the current context window (TBH i'd like that entire concept to be gone but that is another aspect) and b) it should not be some sort of lossy relearning that loses information.
There are some solutions but there are all band-aids to fundamental issues. For example you can occasionally summarize whatever discussed so far and restart the discussion. But obviously that is just some sort of lossy memory compression (i do not care that humans can do the same, LLMs are software running on computers, not humans). Or you could use some sort of RAG but AFAIK this works via "prompt triggering" - i.e. only via your "current" interaction, so even if the knowledge is in there but whatever you are doing now wouldn't trigger its index the LLM will be oblivious to it.
What i want is, e.g., if i tell to the LLM that there is some function `foo` used to barfize moo objects, then go on and tell it other stuff way beyond whatever context length it has, save the discussion or whatever, restore it next day, go on and tell it other stuff, then ask it about joining splarfers, it should be able to tell me that i can join splarfers by converting them to barfized moo objects even if i haven't mentioned anything about moo objects or barfization since my previous session yesterday.
(also as a sidenote, this sort of memory save/load should be explicit since i'd want to be able to start from clean slate - but this sort of clean slate should be because i want to, not as a workaround to the technology's limitations)
You want something that requires an engineering breakthrough.
Models don't have memory, and they don't have understanding or intelligence beyond what they learned in training.
You give them some text (as context), and they predict what should come after (as the answer).
They’re trained to predict over some context size, and what makes them good is that they learn to model relationships across that context in many dimensions. A word in the middle can affect the probability of a word at the end.
If you insanely scale the training and inference to handle massive contexts, which is currently far too expensive, you run into another problem: the model can’t reliably tell which parts of that huge context are relevant. Irrelevant or weakly related tokens dilute the signal and bias it in the wrong direction, the distribution flatten or just ends up in the wrong place.
That's why you have to make sure you give it relevant well attended context, aka, context engineering.
It won't be able to look at a 100kloc code base and figure out what's relevant to the problem at hand, and what is irrelevant. You have to do that part yourself.
Or what some people do, is you can try to automate that part a little as well by using another model to go research and build that context. That's where people say the research->plan->build loop. And it's best to keep to small tasks, otherwise the context needing for a big task will be too big.
> You want something that requires an engineering breakthrough.
Basically, yes. I know the way LLMs currently work wouldn't be able to provide what i want, but what i want is a different way that does :-P (perhaps not even using LLMs).
I'm using a "memory" MCP server which basically just stores facts to a big json file and makes a search available. There's a directive in my system prompt that tells the LLM to store facts and search for them when it starts up.
It seems to work quite well and I'll often be pleasantly surprised when Claude retrieves some useful background I've stored, and seems to magically "know what I'm talking about".
Not perfect by any means and I think what you're describing is maybe a little more fundamental than bolting on a janky database to the model - but it does seem better than nothing.
I routinely ask the LLM to summarise the high level points as guidance and add them to the AGENTS.md / CONVENTIONS.md etc. It is limited due to context bloat but it's quite effective at getting it to persist important things that need to carry over between sessions.
haha I always do that. I think it's a good way to have some control and understand what it is doing before the regurgitation. I don't like to write code but I love the problem solving/logic/integrations part.
A gentle warning to people who are overly trusting: Claude code can and will modify files in plan mode.
Before I switched to a different agent, I routinely ran into situations where I would say "write a plan to do x", it would start planning, and I would steer it by saying something like "update such and such a file, instead of other-file" and it would immediately update it, even though it was in plan mode.
Then I would point out "you're in plan mode, don't update files", and it would go absolutely ham undoing changes and furiously apologizing "You're right! I'm in plan mode, let me undo those changes I wasn't supposed to make!" - meaning that now it's broken the rules twice.
Plan mode does not disable any writing tools, it just changes the system prompt, judging by my experience anyway.
>First, skilled engineers using LLMs to code also think and discuss and stare off into space before the source code starts getting laid down
Yes, and the thinking time is a significant part of overall software delivery, which is why accelerating the coding part doesn't dramatically change overall productivity or labor requirements.
I don't like the artificial distinction b/w thinking and coding. I think they are intimately interwoven. Which is actually one thing I really like about the LLM because it takes away the pain of iterating on several different approaches to see how they pan out. Often it's only when I see code for something that I know I want to do it a different way. Reducing that iteration time is huge and makes me more likely to actually go for the right design rather than settling for something less good since I don't want to throw out all the "typing" I did.
Yeah these days I often give it a zero shot attempt, see where things go wrong, reset the state via git and try again. Being able to try 2-3 prototypes of varying levels of sophistication and scope is something I've done in the past manually, but doing it in an hour instead of a day is truly significant, even if they're half or a quarter of the fidelity I'd get out of a manual attempt.
Honestly, even if I did it that way and then threw it all away and wrote the whole thing manually it'd be worth using. Obviously I don't, because once I've figured out how to scope and coach to get the right result it'd be silly to throw it away, but the same value derives from that step regardless of how you follow it up.
This harkens back to the waterfall vs agile debates. Ideally there would be a plan of all of the architecture with all the pitfalls found out before any code is laid out.
In practice this can’t happen because 30 minutes into coding you will find something that nobody thought about.
In the micro, sure. In the macro, if you are finding architecture problems after 30 minutes, then I’m afraid you aren’t really doing architecture planning up front.
Depends on what you're building. If it's another crud app sure, but if its something remotely novel you just can't understand the landscape without walking through it at least once.
> if its something remotely novel you just can't understand the landscape without walking through it at least once
Sure you can. Mapping out the unknowns (and then having a plan to make each one knowable) is the single most important function of whoever you have designing your architecture.
Up-front architecture isn't about some all-knowing deity proclaiming the perfect architecture from on high. It's an exercise in risk management, just like any other engineering task.
> there are known knowns; there are things we know we know. We also know there are known unknowns; that is to say we know there are some things we do not know. But there are also unknown unknowns—the ones we don't know we don't know.
if you're spending anywhere near as many engineering hours "getting code to work" as you're spending "thinking" then something is wrong in your process
I’m not following. It seems straightforward enough, and consistent with both charts, that a dramatic speedup in coding yields a more modest improvement in overall productivity because typing code is a minority of the work. Is your contention here that the LLM not only documents, but accelerates the thinking part too?
It does, for sure, and I said that in my comment, but no, the point I'm making is that this article isn't premised on thinking being an order of magnitude more work than coding. See: first chart in article.
Most of my initial prompts to agents start with "DO NOT WRITE ANY CODE YET."
Copilot has Ask mode, and GPT-5 Codex has Plan/Chat mode for this specific task. They won't change any files. I've been using Codex for a couple of days and it's very good if you give it plenty of guidance.
I use YOLO mode all the time with Claude Code. Start on a new branch, put it in plan mode (shift + tab twice), get a solid plan broken up in logical steps, then tell it to execute that plan and commit in sensible steps. I run that last part in "YOLO mode" with commit and test commands white listed.
This makes it move with much less scattered interactions from me, which allows focus time on other tasks. And the committing parts make it easier for me to review what it did just like I would review a feature branch created by a junior colleague.
If it's done and tests pass I'll create a pull request (assigned to myself) from the feature branch. Then thoroughly review it fully, this really requires discipline. And then let Claude fetch the pull request comments from the Github API and fix them. Again as a longer run that allows me to do other things.
YOLO-mode is helpful for me, because it allows Claude to run for 30 minutes with no oversight which allows me to have a meeting or work on something else. If it requires input or approval every 2 minutes you're not async but essentially spending all your time watching it run.
It's more about having the LLM give you a plan of what it wants to do and how it wants to do it, rather rhan code. Then you can mold the plan to fit what you really want. Then you ask it to actually start writing code.
Even Claude Code lets you approve each change, but it's already writing code according to a plan that you reviewed and approved.
I’ve also had success writing documentation ahead of time (keeping these in a separate repo as docs), and then referencing it for various stages. The doc will have quasi-code examples of various features, and then I can have a models stubbed in one pass, failing tests in the next, etc.
But there’s a guiding light that both the LLM and I can reference.
Sometimes I wonder if pseudocode could be better for prompting than expressive human language, because it can follow a structure and be expressive but constrained -- have you seen research on this and whether this an effective technique?
With tools you know ahead of time that they will do the job you expect them to do with very high probability, or fail (with low probability) in some obvious way. With LLMs, there are few tasks you can trust them to do, and you also don't know their failure mode. They can fail yet report success. They work like neither humans nor tools.
An LLM behaves like a highly buggy compiler that too frequently reports success while emitting incorrect code. Not knowing where the bugs are, the only thing you can try to do is write the program in some equivalent way but with different syntax, hoping you won't trigger a bug. That is not a tool programmers often use. Learning to work with such a compiler is a skill, but it's unclear how transferable or lasting that skill is.
If LLMs advance as significantly and as quickly as some believe they will, it may be better to just wait for the buggy compiler to be fixed (or largely fixed). Presumably, much less skill will be required to achieve the same result that requires more skill today.
First, skilled engineers using LLMs to code also think and discuss and stare off into space before the source code starts getting laid down. In fact: I do a lot, lot more thinking and balancing different designs and getting a macro sense of where I'm going, because that's usually what it takes to get an LLM agent to build something decent. But now that pondering and planning gets recorded and distilled into a design document, something I definitely didn't have the discipline to deliver dependably before LLM agents.
Most of my initial prompts to agents start with "DO NOT WRITE ANY CODE YET."
Second, this idea that LLMs are like junior developers that can't learn anything. First, no they're not. Early-career developers are human beings. LLMs are tools. But the more general argument here is that there's compounding value to working with an early-career developer and there isn't with an LLM. That seems false: the LLM may not be learning anything, but I am. I use these tools much more effectively now than I did 3 months ago. I think we're in the very early stages of figuring how to get good product out of them. That's obvious compounding value.