> it’s not hard to extrapolate and imagine that process becoming fully automated...

thatoneengineer · 2025-12-16T23:17:34 1765927054

I disagree. Right now, feedback on correctness is a major practical limitation on the usefulness of AI coding agents. They can fix compile errors on their own, they can _sometimes_ fix test errors on their own, but fixing functionality / architecture errors takes human intervention. Formal verification basically turns (a subset of) functionality errors into compile errors, making the feedback loop much stronger. "Come up with what software we need better than us in the first place" is much higher on the ladder than that.

TL;DR: We don't need to be radically agnostic about the capabilities of AI-- we have enough experience already with the software value chain (with and without AI) for formal verification to be an appealing next step, for the reasons this author lays out.

pron · 2025-12-16T23:22:42 1765927362

I completely agree it's appealing, I just don't see a reason to assume that an agent will be able to succeed at it and at the same time fail at other things that could make the whole exercise redundant. In other words, I also want agents to be able to consistently prove software correct, but if they're actually able to accomplish that, then they could just as likely be able to do everything else in the production of that software (including gathering requirements and writing the spec) without me in the loop.

DoctorOetker · 2025-12-18T01:34:46 1766021686

>I just don't see a reason to assume that an agent will be able to succeed at it and at the same time fail at other things that could make the whole exercise redundant.

But that is much simpler to understand: eventually finding a proof using guided search (machines searching for proofs, multiple inference attempts) takes more effort than verifying a proof. Formal verification does not disappear, because communicating a valuable succinct proof is much cheaper than having to search for the proof anew. The proofs will become inevitable lingua franca (like it is among capable humans) for computers as well. Basic economics will result in adoption of formal verification.

Whenever humans found an original proof, their notes will contain a lot of deductions that were ultimately not used, they were searching for a proof, using intuition gained in reading and finding proofs of other theorems. It's just that LLM's are similarily gaining intuition, and at some point become better than humans at finding proofs. It is currently already much better than the average human at finding proofs. The question is how long it takes until it gets better than any human being at finding proofs.

The future you see where the whole proving exercise (if by humans or by LLMs) becomes redundant because it immediately emits the right code is nonsensical: the frontier of what LLM's are capable of will move gradually, so for each generation of LLMs. there will always be problems it can not instantly generate provably correct software (but omitting the according-to-you-unnecessary proof). That doesn't mean they can't find the proofs, just that it would have to search by reasoning, with no guarantee if it ever finds a proof.

That search heuristic is Las Vegas, not Monte Carlo.

Companies will compare the levelized operating costs of different LLM's to decide which LLMs to use in the future on hard proving tasks.

Satellite data centers will consume ever more resources in a combined space/logic race for cryptographic breakthroughs.

UncleEntity · 2025-12-17T00:36:19 1765931779

> I also want agents to be able to consistently prove software correct...

I know this is just an imprecision of language thing but they aren't 'proving' the software is correct but writing the proofs instead of C++ (or whatever).

I had a but of a discussion with one of them about this a while ago to determine the viability of having one generate the proofs and use those to generate the actual code, just another abstraction over the compiler. The main takeaway I got from that (which may or may not be the way to do) is to use the 'result' to do differential testing or to generate the test suite but that was (maybe, don't remember) in the context of proving existing software was correct.

I mean, if they get to the point where they can prove an entire codebase is correct just in their robot brains I think we'll probably have a lot bigger things to worry about...

qingcharles · 2025-12-17T01:12:02 1765933922

It's getting better every day, though, at "closing the loop."

When I recently booted up Google Antigravity and had it make a change to a backend routine for a web site, I was quite surprised when it opened Chrome, navigated to the page, and started trying out the changes to see if they had worked. It was janky as hell, but a year from now it won't be.

pron · 2025-12-17T11:57:52 1765972672

To make this more constructive, I think that today AI tools are useful when they do things you already know how to do and can assess the quality of the output. So if you know how to read and write a formal specification, LLMs can already help translating natural-language descriptions to a formal spec.

It's also possible that LLMs can, by themselves, prove the correctness of some small subroutines, and produce a formal proof that you can check in a proof checker, provided you can at least read and understand the statement of the proposition.

This can certainly make formal verification easier, but not necessarily more mainstream.

But once we extrapolate the existing abilities to something that can reliably verify real large or medium-sized programs for a human who cannot read and understand the propositions (and the necessary simplifying assumptions) that it's hard to see a machine do that and at the same time not able to do everything else.

1121redblackgo · 2025-12-16T22:55:21 1765925721

First human robot war is us telling the AI/robots 'no', and them insisting that insert technology here is good for us and is the direction we should take. Probably already been done, but yeah, this seems like the tipping point into something entirely different for humanity.

pron · 2025-12-16T22:58:07 1765925887

... if it's achievable at all in the near future! But we don't know that. It's just that if we assume AI can do X, why do we assume it cannot, at the same level of capability, also do Y? Maybe the tipping point where it can do both X and Y is near, but maybe in the near future it will be able to do neither.