Ah the classic "show me ironclad evidence of this impossible-to-prove-but-quite-...

ux266478 · 2025-12-10T05:16:26 1765343786

Actually it's the classic "inductive reasoning has to meet a set of strict criteria to be sound." Criteria which this does not meet. Extrapolation from a sample size of one? In a context without any LLM involvement? That's not sound, the conclusion does not follow. The point being, why bother making a statistical generalization? Rust's safety is formally known, deduction over concrete postulates was appropriate.

> it would be very surprising if the situation were completely reversed for LLMs

Lifetimes must be well-defined in safe Rust, which requires a deep degree of formal reasoning. The kind of complex problem analysis where it is known that LLMs produce worse results in than humans. Specifically in the context of security vulnerabilities, LLMs produce marginally less but significantly more severe issues in memory safe languages[1]. Still though, we might say LLMs will produce safer code with safe Rust, on the basis that 100,000 vibe coded lines will probably never compile.

[1] - https://arxiv.org/html/2501.16857v1

IshKebab · 2025-12-10T08:32:33 1765355553

I never claimed to be doing a formal proof. If someone said "traffic was bad this morning" would you say "have you done a scientific study on the average journey times across the year and for different locations to know that it was actually bad"?

> LLMs produce worse results in than humans

We aren't talking about whether LLMs are better than humans.

Also we're obviously talking about Rust code that compiles. Code that doesn't compile is 100% secure!

ux266478 · 2025-12-10T17:02:22 1765386142

I didn't claim that you were doing formal proofs. You can still make bad rhetoric, formal or not. You can say "The sky is blue, therefore C is a memory safe language" and that's trivially inferred to be faulty reasoning. For many people bad deduction is easier to pick up on than bad induction, but they're both rhetorically catastrophic. You are making a similarly unsound conclusion to the ridiculous example above, its not a valid statistical generalization. Formal or not the rhetoric is faulty.

> would you say "have you done a scientific study on the average journey times across the year and for different locations to know that it was actually bad"?

In response to a similarly suspiciously faulty inductive claim? Yeah, absolutely.

> We aren't talking about whether LLMs are better than humans.

The point I'm making here is specifically in response to the idea that it would "be surprising" if LLMs produced substantially worse code in Rust than they did in C. The paper I posted is merely a touch point to demonstrate substantial deviation in results in an adjacent context. Rust has lower surface area to make certain classes of vulns under certain conditions, but that's not isomorphic with the kind of behavior LLMs exhibit. We don't have:

- Guarantees LLMs will restrict themselves to operating in safe Rust

- Guarantees these specific vulnerabilities are statistically significant in comparative LLM output

- The vulnerability severity will be lower in Rust

Where I think you might be misunderstanding me is that this isn't a statement of empirical epistemological negativism. I'm underlining that this context is way too complex to be attempting prediction. I think it should be studied, and I hope that it's the case LLMs can write good, high quality safe Rust reliably. But specifically advocating for it on gut assumptions? No. We are advocating for safety here.

Because of how chaotic this context is, we can't reasonably assume anything here without explicit data to back it up. It's no better than trying to predict the weather based on your gut. Hence, why I asked for specific data to back the claim up. Even safe Rust isn't safe from security vulnerabilities stemming from architectural inadequacies and panics. It very well may be the case that in reasonably comparable contexts, LLMs produce security vulnerabilities in real Rust codebases at the same rate they create similar vulnerabilities in C. It might also be the case that they produce low severity issues in C at a similar statistical rate as high severity issues in Rust. For instance, buffer overflows manifesting in 30% of sample C codebase resulting in unexploitable segfaults, vs architectural deficiencies in a safe Rust codebase, manifesting in 30% of cases, that allow exfiltration of everything in your databases without RCE. Under these conditions, I don't think it's reasonable to say Rust is a better choice.

Again, it's not a critique in some epistemological negativist sense. It's a critique that you are underestimating how chaotic this context actually is, and the knock-on effects of that. Nothing should surprise you.