You can vibe-code a throwaway UI for investigating some complex data in less than 30 minutes. The code quality doesn't matter, and it will make your life much easier.
Rinse and repeat for many "one-off" tasks.
It's not going away, you need to learn how to use it. shrugs shoulders
The issue is people trying to use these AI tools to investigate complex data not the throwaway UI part.
I work as the non-software kind of engineer at an industrial plant there is starting to emerge a trend of people who just blindly trust the output of AI chat sessions without understanding what the chat bot is echoing at them which is wasteful of their time and in some cases my time.
This not not new in the past I have experienced engineers who use (abuse) statistics/regression tools etc. Without understanding what the output was telling them but it is getting worse now.
It is not uncommon to hear something like: "Oh I investigated that problem and this particular issue we experienced was because of reasons x, y and z."
Then when you push back because what they've said sounds highly unlikely it boils down to. "I don't know that is what the AI told me".
Then if they are sufficiently optimistic they'll go back and prompt it with "please supply evidence for your conclusion" or some similar prompt and it will supply paragraphs of plausible sounding text but when you dig into what it is saying there are inconsistencies or made up citations. I've seen it say things that were straight up incorrect and went against Laws of Thermodynamics for example.
It has become the new "I threw the kitchen sink into a multivariate regression and X emerged as significant - therefore we should address x"
I'm not a complete skeptic I think AI has some value, for example if you use it as a more powerful search engine by asking it something like "What are some suggested techniques for investigating x" or "What are the limitations of Method Y" etc. It can point you to the right place assist you with research, it might find papers from other fields or similar. But it is not something you should be relying on to do all of the research for you.
But how do you know you're getting the correct picture from that throwaway UI? A little while back there was an blog posted where the author wrote an article praising AI for his vibe-coded earth-viewer app that used Vulkan to render inside a GUI window. Unfortunately, that wasn't the case and AI just copied from somewhere and inserted code for a rudimentary software rendering. The AI couldn't do what was asked because it had seldom been done. Nobody on the internet ever discussed that particular objective, so it wasn't in the training set.
The lesson to learn is that these are "large-language models." That means it can regurgitate what someone else has done before textually, but not actually create something novel. So it's fine if someone on the internet has posted or talked about a quick UI in whatever particular toolkit you're using to analyze data. But it'll throw out BS if you ask for something brand new. I suspect a lot of AI users are web developers who write a lot of repetitive rote boilerplate, and that's the kind of thing these LLMs really thrive with.
> But how do you know you're getting the correct picture from that throwaway UI?
You get the AI to generate code that lets you spot-check individual data points :-)
Most of my work these days is in fact that kind of code. I'm working on something research-y that requires a lot of visualization, and at this point I've actually produced more throwaway code than code in the project.
Here's an example: I had ChatGPT generate some relatively straightforward but cumbersome geometric code. Saved me 30 - 60 minutes right there, but to be sure, I had it generate tests, which all passed. Another 30 minutes saved.
I reviewed the code and the tests and felt it needed more edge cases, which I added manually. However, these started failing and it was really cumbersome to make sense of a bunch of coordinates in arrays.
So I had it generate code to visualize my test cases! That instantly showed me that some assertions in my manually added edge cases were incorrect, which became a quick fix.
The answer to "how do you trust AI" is human in the loop... AND MOAR AI!!! ;-)
They're good questions! The problem is that I've tried to talk to the people who are getting real value from it, and often the answer ends up being that the value is not as real as they think. One guy gave an excited presentation about how AI let him write 7k LOC per day, expounded for an entire session about how the rest of us should follow in his shoes, and then clarified only in Q&A that reviewers couldn't keep up so he exempted himself from code review.
I’m starting to believe there are situations where the human code review is genuinely not necessary. Here’s a concrete example of something that’s been blowing my mind. I have 25 years of professional coding experience but it’s almost all web, with a few years of iOS in the objective C era. I’m also an amateur electronic musician. A couple of weeks ago I was thinking about this plugin that I used to love until the company that made it went under. I’ve long considered trying to make a replacement but I don’t know the first thing about DSP or C++.
You know where this is going. I asked Claude if audio plugins were well represented in its training data, it said yes, off I went. I can’t review the code because I lack the expertise. It’s all C++ with a lot of math and the only math I’ve needed since college is addition and calculating percentages. However, I can have intelligent discussions about design and architecture and music UX. That’s been enough to get me a functional plugin that already does more in some respects than the original. I am (we are?) making it steadily more performant. It has only crashed twice and each time I just pasted the dump into Claude and it fixed the root cause.
Long story short: if you can verify the outcome, do you need to review the code? It helps that no one dies or gets underpaid if my audio plugin crashes. But still, you can’t tell me this isn’t remarkable. I think it’s clear there will be a massive proliferation of niche software.
I don’t think I’ve ever seen someone seriously argue that personal throwaway projects need thorough code reviews of their vibe code. The problem comes in when I’m maintaining a 20 year old code base used by anywhere from 1M to 1B users.
In other words you can’t vibe code in an environment where evaluating “does this code work” is an existential question. This is the case where 7k LOC/day becomes terrifying.
Until we get much better at automatically proving correctness of programs we will need review.
My point about my experience with this plugin isn’t that it’s a throwaway or meaningless project. My point is that it might be enough in some cases to verify output without verifying code. Another example: I had to import tens of thousands of records of relational data. I got AI to write the code for the import. All I verified was that the data was imported correctly. I didn’t even look at the code.
In this context I meant throwaway as "low stakes" not "meaningless". Again, evaluating the output of a database import like that could be existensial for your company given the context. Not to mention there's many cases where evaluating the output isn't feasible for a human.
Human code review does not prove correctness. Almost every software service out there contains bugs. Humans have struggled for decades to reliably produce correct software at scale and speed. Overall, humans have a pretty terrible track record of producing bug-free correct code no matter how much they double-check and review their code along the way.
So the solution is to stop doing code reviews and just YOLO-merge everything? After all, everything is fucked already, how much worse could it get?
For the record, there are examples where human code review and design guidelines can lead to very low-bug code. NASA published their internal guidelines for producing safety-critical code[1]. The problem is that the development cost of software when using such processes is too high for most companies, and most companies don't actually produce safety-critical software.
My experience with the vast majority of LLM code submitted to projects I maintain is that it has subtle bugs that I managed to find through fairly cursory human review. The copilot code review feature on GitHub also tends to miss actual bugs and report nonexistent bugs, making it worse than useless. So in my view, the death of the benefits of human code review have been wildly exaggerated.
No, that's not what I wrote, and it's not the correct conclusion. What I wrote (and what you, in fact, also wrote) is that in reality we generally do not actually need provably correct software except in rare cases (e.g., safety-critical applications). Suggesting that human review cannot be reduced or phased out at all until we can automatically prove correctness is wrong, because fully 100% correct and bug-free software is not needed for the vast majority of code being produced. That does not mean we immediately throw out all human review, but the bar for making changes for how we review code is certainly much lower than the above poster suggested.
I don't really buy your premise. What you're suggesting is that all code has bugs, and those bugs have equal severity and distribution regardless of any forethought or rigor put into the code.
You're right, human review and thorough design are a poor approximation of proving assumptions about your code. Yes bugs still exist. No you won't be able to prove the correctness of your code.
However, I can pretty confidently assume that malloc will work when I call it. I can pretty confidently assume that my thoroughly tested linked list will work when I call it. I can pretty confidently assume that following RAII will avoid most memory leaks.
Not all software needs meticulous careful human review. But I believe that the compounding cost of abstractions being lost and invariants being given up can be massive. I don't see any other way to attempt to maintain those other than human review or proven correctness.
I did suggest all code has bugs (up to some limit -- while I wasn't careful to specify this, as discussed above, there does exist an extraordinary level of caution and review that if used can approximate perfect bug-free code, as in your malloc example and in the example of NASA, but that standard is not currently applied to 99.9% of human-generated and human-reviewed code, and it doesn't need to be). I did not suggest anything else you said I suggested, so I'm not sure why you made those parts up.
"Not all software needs meticulous careful human review" is exactly the point. The question of exactly what software needs that kind of review is one whose answer I expect to change over the next 5-10 years. We are already at the point where it's so easy to produce small but highly non-trivial one-off applications that one needn't examine the code at all -- I completely agree with the above poster that we're rapidly discovering new examples of software development where output-verification is all you need, just like right now you don't hand-inspect the machine code generated by your compiler. The question is how far that will be able to go, and I don't think anybody really knows right now, except that we are not yet at the threshold. You keep bringing up examples where the stakes are "existential", but you're underestimating how much software development does not have anything close to existential stakes.
I agree that's remarkable, and I do expect a proliferation of LLM-assisted development in similar niches where verification is easy and correctness isn't critical. But I don't think most software developers today are in such niches.
Most enterprise software I use has serious defects. Professional CAD software for infrastructure is awful. Many are just incremental improvements piled upon software from the 1990s. Bugs last for decades because nobody can understand how the program works so they just work on one more little VBA plugin at a time. Meanwhile, the capabilities of these programs have fallen completely behind game studios with no budget and no business plan. Where are the results of this human excellence and code quality process? There are 10s of thousands of new CVEs every year from code hand crafted by artisans on their very own MacBooks. How? Perhaps there is the tiny possibility that maybe code quality is mostly an aesthetic judgment that nobody can really define, and just maybe this effort is mostly spent on vague concepts like maintainability or preferential decisions instead of the basics: does it meet the specification? Is the performance getting better or worse?
This is the game changer for me: I don’t have to evaluate tens or hundreds of market options that fit my problem. I tell the machine to solve it, and if it works, then I’m happy. If it doesn’t I throw it away. All in a few minutes and for a few cents. Code is going the way of the disposable diaper, and, if you ever washed a cloth diaper you will know, that’s a good thing.
> I tell the machine to solve it, and if it works, then I’m happy. If it doesn’t I throw it away.
What happens when it seems to work, and you walk away happy, but discover three months later that your circular components don't line up because the LLM-written CAD software used an over-rounded PI = 3.14? I don't work in industrial design, but I faced a somewhat similar issue where an LLM-written component looked fine to everyone until final integration forced us to rewrite it almost entirely.
This is basically me at my job right now. My boss used Claude Code in his spare time to write a "proof of concept" Electron app. It mostly worked but had some weird edge case behaviors. Now it's handed off to me, and fixing those edge cases is requiring me to refactor basically every single thing Claude touched. Vast majority I'm just tossing and redoing from scratch.
The original code "looks" fine, and it works pretty well even, but an LLM cannot avoid critical oversights along the way, and is fundamentally designed to its mistakes look as plausibly correct as possible. This makes correcting the problems down the line much more annoying (unless you can afford to live with the bugs and keep slapping on more band aids, i guess)
Most people don't have a problem with using genai for stuff like throwaway UI's. That's not even remotely relevant to the criticisms. People reject having it forced down their throats by companies who are desperate to make us totally reliant on it to justify their insane investments. And people reject the evangelicals who claim that it's going to replace developers because it can spit out mostly working boilerplate.
I’m an AI skeptic. I like seeing what UIs it spits out, though, which defeats the blank page staring into my soul fear nicely. I don’t even use the code, just take inspiration from the layouts.
Yeah, it helps a lot to make first steps, to overcome writers block, to make you put into words what you'd like to have built.
At one point you might take over, ask it for specific refactors you'd do but are too lazy to do yourself. Or even toss it away entirely and start fresh with better understanding. Yourself or again with agent.
It's like watching somebody argue that code linting is going to change the face of the world and the rebuttals to the skeptics are arguing that akshually code linting is quite useful....
I have found value for one off tasks. I forget the exact situation, but I wanted to do some data transformation, something that would normally take me a half hour of awk/sed/bash or python scripting. AI spit it out right away.
> You can vibe-code a throwaway UI for investigating some complex data in less than 30 minutes. The code quality doesn't matter, and it will make your life much easier.
I think the throwaway part is important here and people are missing it, particularly for non-programmers.
There's a lot of roles in the business world that would make great use of ephemeral little apps like this to do a specific task, then throw it away. Usually just running locally on someone's machine, or at most shared with a couple other folks in your department.
Code doesn't have to be good, hell it doesn't even have to be secure, and certainly doesn't need to look pretty. It just needs to work.
There's not enough engineering staff or time to turn every manager's pet excel sheet project into a temporary app, so LLMs make perfect sense here.
I'd go as far to say more effort should be put into ephemeral apps as a use case for LLMs over focusing on trying to use them in areas where a more permanent, high quality solution is needed.
And then people create non-throwaway things with it and your job, performance report, bonus, and healthcare are tied to being compared to those people who just do what management says without arguing about the correct application of the tool.
If you keep your job, it's now tied to maintaining the garbage those coworkers checked in.
Perhaps. But does it matter? There is a million tools to investigate complex data already. Are you suggesting it is more useful to develop a new tool from scratch, using LLM-type tools, than it is to use a mature tool for data analysis?
If you don't know how to analyze data, and flat out refuse to invest in learning the skill, then I guess that could be really useful. Those users are likely the ones most enthusiastic about AI. But are those users close to as productive as someone who learns a mature tool? Not even close.
Lots of people appreciate an LLM to generate boiler plate code and establish frameworks for their data structures. But that's code that probably shouldn't be there in the first place. Vibe coding a game can be done impressively quick, but have you tried using a game construction kit? That's much faster still.
Except when your AI psychosis PM / manager sees your throwaway vibe-coded garbage and demands it gets shipped to customers.
It's infinitely worse when your PM / manager vibe-codes some disgusting garbage, sees that it kind of looks like a real thing that solves about half of the requirements (badly) and demands engineers ship that and "fix the few remaining bugs later".
One thing people often don't realize or ignore: these LLMs are trained on the internet, the entire internet.
There's a shit-ton of bad and inefficient code on the internet. Lots of it. And it was used to train these LLMs as much as the good code.
In other words, the LLMs are great if you're OK with mediocrity at best. Mediocrity is occasionally good enough, but it can spell death for a company when key parts of it are mediocre.
I'm afraid a lot of the executives who fantasize about replacing humans with AI are going to have to learn this the hard way.
Rinse and repeat for many "one-off" tasks.
It's not going away, you need to learn how to use it. shrugs shoulders