It's a fine post, but two canards in here: First, skilled engineers using LLMs t...

badsectoracula · 2025-09-28T16:13:26 1759076006

> the LLM may not be learning anything, but I am

Regardless of that, personally i'd really like it if they could actually learn from interacting with them. From a user's perspective what i'd like to do is to be able to "save" the discussion/session/chat/whatever, with everything the LLM learned so far, to a file. Then later be able to restore it and have the LLM "relearn" whatever is in it. Now, you can already do this with various frontend UIs, but the important part in what i'd want is that a) this "relearn" should not affect the current context window (TBH i'd like that entire concept to be gone but that is another aspect) and b) it should not be some sort of lossy relearning that loses information.

There are some solutions but there are all band-aids to fundamental issues. For example you can occasionally summarize whatever discussed so far and restart the discussion. But obviously that is just some sort of lossy memory compression (i do not care that humans can do the same, LLMs are software running on computers, not humans). Or you could use some sort of RAG but AFAIK this works via "prompt triggering" - i.e. only via your "current" interaction, so even if the knowledge is in there but whatever you are doing now wouldn't trigger its index the LLM will be oblivious to it.

What i want is, e.g., if i tell to the LLM that there is some function `foo` used to barfize moo objects, then go on and tell it other stuff way beyond whatever context length it has, save the discussion or whatever, restore it next day, go on and tell it other stuff, then ask it about joining splarfers, it should be able to tell me that i can join splarfers by converting them to barfized moo objects even if i haven't mentioned anything about moo objects or barfization since my previous session yesterday.

(also as a sidenote, this sort of memory save/load should be explicit since i'd want to be able to start from clean slate - but this sort of clean slate should be because i want to, not as a workaround to the technology's limitations)

didibus · 2025-09-28T17:15:22 1759079722

You want something that requires an engineering breakthrough.

Models don't have memory, and they don't have understanding or intelligence beyond what they learned in training.

You give them some text (as context), and they predict what should come after (as the answer).

They’re trained to predict over some context size, and what makes them good is that they learn to model relationships across that context in many dimensions. A word in the middle can affect the probability of a word at the end.

If you insanely scale the training and inference to handle massive contexts, which is currently far too expensive, you run into another problem: the model can’t reliably tell which parts of that huge context are relevant. Irrelevant or weakly related tokens dilute the signal and bias it in the wrong direction, the distribution flatten or just ends up in the wrong place.

That's why you have to make sure you give it relevant well attended context, aka, context engineering.

It won't be able to look at a 100kloc code base and figure out what's relevant to the problem at hand, and what is irrelevant. You have to do that part yourself.

Or what some people do, is you can try to automate that part a little as well by using another model to go research and build that context. That's where people say the research->plan->build loop. And it's best to keep to small tasks, otherwise the context needing for a big task will be too big.

badsectoracula · 2025-09-28T23:09:36 1759100976

> You want something that requires an engineering breakthrough.

Basically, yes. I know the way LLMs currently work wouldn't be able to provide what i want, but what i want is a different way that does :-P (perhaps not even using LLMs).

pixl97 · 2025-09-29T14:00:10 1759154410

What you want is actual AGI/ASI, which is a different can of worms and likely has a whole list of different existential problems that come with it.

badsectoracula · 2025-09-30T09:11:44 1759223504

No, an LLM not forgetting stuff discussed minutes ago wouldn't make it AGI.

epiccoleman · 2025-09-28T18:20:57 1759083657

I'm using a "memory" MCP server which basically just stores facts to a big json file and makes a search available. There's a directive in my system prompt that tells the LLM to store facts and search for them when it starts up.

It seems to work quite well and I'll often be pleasantly surprised when Claude retrieves some useful background I've stored, and seems to magically "know what I'm talking about".

Not perfect by any means and I think what you're describing is maybe a little more fundamental than bolting on a janky database to the model - but it does seem better than nothing.

zmmmmm · 2025-09-29T04:41:00 1759120860

I routinely ask the LLM to summarise the high level points as guidance and add them to the AGENTS.md / CONVENTIONS.md etc. It is limited due to context bloat but it's quite effective at getting it to persist important things that need to carry over between sessions.

badsectoracula · 2025-09-29T08:53:23 1759136003

Yeah, as i wrote this is a common workaround, but what i want is for it to remember everything, not just the important bits.

TBH i'm not even sure if that is possible with LLMs, especially in a way that does not rely on using the context.

boredemployee · 2025-09-28T16:14:47 1759076087

DO NOT WRITE ANY CODE YET.

haha I always do that. I think it's a good way to have some control and understand what it is doing before the regurgitation. I don't like to write code but I love the problem solving/logic/integrations part.

tptacek · 2025-09-28T16:16:40 1759076200

I'm surprised (or maybe just ignorant) that Claude doesn't have an explicit setting for this, because it definitely tends to jump the gun a lot.

ctoth · 2025-09-28T16:27:41 1759076861

Plan mode (shift-tab twice) might be what you want.

jaggederest · 2025-09-29T06:42:02 1759128122

A gentle warning to people who are overly trusting: Claude code can and will modify files in plan mode.

Before I switched to a different agent, I routinely ran into situations where I would say "write a plan to do x", it would start planning, and I would steer it by saying something like "update such and such a file, instead of other-file" and it would immediately update it, even though it was in plan mode.

Then I would point out "you're in plan mode, don't update files", and it would go absolutely ham undoing changes and furiously apologizing "You're right! I'm in plan mode, let me undo those changes I wasn't supposed to make!" - meaning that now it's broken the rules twice.

Plan mode does not disable any writing tools, it just changes the system prompt, judging by my experience anyway.

tptacek · 2025-09-28T16:37:06 1759077426

See, I called it! Ignorance it is!

mccoyb · 2025-09-28T16:31:01 1759077061

Is that not exactly plan mode?

closeparen · 2025-09-28T16:34:38 1759077278

>First, skilled engineers using LLMs to code also think and discuss and stare off into space before the source code starts getting laid down

Yes, and the thinking time is a significant part of overall software delivery, which is why accelerating the coding part doesn't dramatically change overall productivity or labor requirements.

zmmmmm · 2025-09-29T04:52:50 1759121570

I don't like the artificial distinction b/w thinking and coding. I think they are intimately interwoven. Which is actually one thing I really like about the LLM because it takes away the pain of iterating on several different approaches to see how they pan out. Often it's only when I see code for something that I know I want to do it a different way. Reducing that iteration time is huge and makes me more likely to actually go for the right design rather than settling for something less good since I don't want to throw out all the "typing" I did.

jaggederest · 2025-09-29T06:38:14 1759127894

Yeah these days I often give it a zero shot attempt, see where things go wrong, reset the state via git and try again. Being able to try 2-3 prototypes of varying levels of sophistication and scope is something I've done in the past manually, but doing it in an hour instead of a day is truly significant, even if they're half or a quarter of the fidelity I'd get out of a manual attempt.

Honestly, even if I did it that way and then threw it all away and wrote the whole thing manually it'd be worth using. Obviously I don't, because once I've figured out how to scope and coach to get the right result it'd be silly to throw it away, but the same value derives from that step regardless of how you follow it up.

tptacek · 2025-09-28T16:36:45 1759077405

This logic doesn't even cohere. Thinking is a significant part of software delivery. So is getting actual code to work.

swiftcoder · 2025-09-28T16:52:50 1759078370

ideally there is an order of magnitude difference between, and the latter is trivially delegable (where the former is not)

tptacek · 2025-09-28T17:51:46 1759081906

No there isn't.

yoz-y · 2025-09-28T19:46:27 1759088787

This harkens back to the waterfall vs agile debates. Ideally there would be a plan of all of the architecture with all the pitfalls found out before any code is laid out.

In practice this can’t happen because 30 minutes into coding you will find something that nobody thought about.

swiftcoder · 2025-09-28T19:56:38 1759089398

In the micro, sure. In the macro, if you are finding architecture problems after 30 minutes, then I’m afraid you aren’t really doing architecture planning up front.

b_e_n_t_o_n · 2025-09-28T20:31:09 1759091469

Depends on what you're building. If it's another crud app sure, but if its something remotely novel you just can't understand the landscape without walking through it at least once.

swiftcoder · 2025-09-29T12:09:15 1759147755

> if its something remotely novel you just can't understand the landscape without walking through it at least once

Sure you can. Mapping out the unknowns (and then having a plan to make each one knowable) is the single most important function of whoever you have designing your architecture.

Up-front architecture isn't about some all-knowing deity proclaiming the perfect architecture from on high. It's an exercise in risk management, just like any other engineering task.

pixl97 · 2025-09-29T13:57:44 1759154264

>Mapping out the unknowns

and

> isn't about some all-knowing deity

Seems like a big conflict of your own thoughts.

Or as they say....

> there are known knowns; there are things we know we know. We also know there are known unknowns; that is to say we know there are some things we do not know. But there are also unknown unknowns—the ones we don't know we don't know.

swiftcoder · 2025-09-29T14:10:58 1759155058

> But there are also unknown unknowns—the ones we don't know we don't know.

And pretty much your job as an engineer is to mitigate the risk that poses.

We don't build new architectures in a vacuum. We build on what has worked before in similar situations, and we adapt it to the problem at hand.

That adaptation is an ongoing process - but it's not the same as saying "fuck it, let's vibe-code our architecture"

spookie · 2025-09-29T01:50:16 1759110616

Oftentimes, micro pitfalls are ill omens that some bigger issue is afoot.

kiitos · 2025-09-28T20:42:39 1759092159

if you're spending anywhere near as many engineering hours "getting code to work" as you're spending "thinking" then something is wrong in your process

tptacek · 2025-09-29T00:14:33 1759104873

Take that up with the author of this article!

cutemonster · 2025-09-29T04:45:42 1759121142

Maybe you're right both of you, but you're thinking about two pretty different software projects and domains :-)

closeparen · 2025-09-28T23:47:52 1759103272

Which part of the third chart do you disagree with?

tptacek · 2025-09-29T00:03:47 1759104227

Here we're really arguing about his first chart, which I agree with, and you do not.

closeparen · 2025-09-29T02:18:27 1759112307

I’m not following. It seems straightforward enough, and consistent with both charts, that a dramatic speedup in coding yields a more modest improvement in overall productivity because typing code is a minority of the work. Is your contention here that the LLM not only documents, but accelerates the thinking part too?

tptacek · 2025-09-29T03:13:01 1759115581

It does, for sure, and I said that in my comment, but no, the point I'm making is that this article isn't premised on thinking being an order of magnitude more work than coding. See: first chart in article.

lomase · 2025-09-28T17:09:22 1759079362

I have not profiled how much time I am just codding at work, but is not the biggest time sink.

If their job is basically to generate code to close jira tickets I can see the appeal of LLMs.

yggdrasil_ai · 2025-09-28T16:39:18 1759077558

Self disciplined humans are far and few between, that seems to be the point of most of these anti-ai articles, and I tend to agree with them.

onion2k · 2025-09-28T20:05:22 1759089922

Most of my initial prompts to agents start with "DO NOT WRITE ANY CODE YET."

Copilot has Ask mode, and GPT-5 Codex has Plan/Chat mode for this specific task. They won't change any files. I've been using Codex for a couple of days and it's very good if you give it plenty of guidance.

AlexCoventry · 2025-09-28T16:27:40 1759076860

> figuring how to get good product out of them

What have you figured out so far, apart from explicit up-front design?

surgical_fire · 2025-09-28T16:46:58 1759078018

> Most of my initial prompts to agents start with "DO NOT WRITE ANY CODE YET."

I really like that on IntelliJ I have to approve all changes, so this prompt is unnecessary.

There's a YOLO mode that just changes shit without approval, that I never use. I wonder if anyone does.

t0mas88 · 2025-09-28T17:18:10 1759079890

I use YOLO mode all the time with Claude Code. Start on a new branch, put it in plan mode (shift + tab twice), get a solid plan broken up in logical steps, then tell it to execute that plan and commit in sensible steps. I run that last part in "YOLO mode" with commit and test commands white listed.

This makes it move with much less scattered interactions from me, which allows focus time on other tasks. And the committing parts make it easier for me to review what it did just like I would review a feature branch created by a junior colleague.

If it's done and tests pass I'll create a pull request (assigned to myself) from the feature branch. Then thoroughly review it fully, this really requires discipline. And then let Claude fetch the pull request comments from the Github API and fix them. Again as a longer run that allows me to do other things.

YOLO-mode is helpful for me, because it allows Claude to run for 30 minutes with no oversight which allows me to have a meeting or work on something else. If it requires input or approval every 2 minutes you're not async but essentially spending all your time watching it run.

dvratil · 2025-09-28T17:10:09 1759079409

It's more about having the LLM give you a plan of what it wants to do and how it wants to do it, rather rhan code. Then you can mold the plan to fit what you really want. Then you ask it to actually start writing code.

Even Claude Code lets you approve each change, but it's already writing code according to a plan that you reviewed and approved.

dpflan · 2025-09-28T16:06:23 1759075583

> Most of my initial prompts to agents start with "DO NOT WRITE ANY CODE YET."

I like asking for the plan of action first, what does it think to do before actually do any edits/file touching.

james_marks · 2025-09-28T16:12:25 1759075945

I’ve also had success writing documentation ahead of time (keeping these in a separate repo as docs), and then referencing it for various stages. The doc will have quasi-code examples of various features, and then I can have a models stubbed in one pass, failing tests in the next, etc.

But there’s a guiding light that both the LLM and I can reference.

dpflan · 2025-09-28T16:37:24 1759077444

Sometimes I wonder if pseudocode could be better for prompting than expressive human language, because it can follow a structure and be expressive but constrained -- have you seen research on this and whether this an effective technique?

LunaSea · 2025-09-29T14:39:52 1759156792

I had good success with prompting using the fully complete code.

pron · 2025-09-28T17:18:05 1759079885

> LLMs are tools

With tools you know ahead of time that they will do the job you expect them to do with very high probability, or fail (with low probability) in some obvious way. With LLMs, there are few tasks you can trust them to do, and you also don't know their failure mode. They can fail yet report success. They work like neither humans nor tools.

An LLM behaves like a highly buggy compiler that too frequently reports success while emitting incorrect code. Not knowing where the bugs are, the only thing you can try to do is write the program in some equivalent way but with different syntax, hoping you won't trigger a bug. That is not a tool programmers often use. Learning to work with such a compiler is a skill, but it's unclear how transferable or lasting that skill is.

If LLMs advance as significantly and as quickly as some believe they will, it may be better to just wait for the buggy compiler to be fixed (or largely fixed). Presumably, much less skill will be required to achieve the same result that requires more skill today.