Probably people at some point were making same arguments about ASM and C. How many people though do ASM these days? Not arguing that for now it is relevant point, obviously Rust / C are way faster.
I doubt it. C is well within 2x of what you can achieve with hand written assembly in almost every case.
Furthermore writing large programs in pure assembly is not really feasible, but writing large programs in C++, Go, Rust, Java, C#, Typescript, etc. is totally feasible.
This is probably the right solution. It seems in reality nobody does this since it is expensive (more teachers, real attention to students, etc). Also if there is an explicit split there will be groups of people who "game" it (spend disproportional amount of time to "train" their kids vs actual natural talent - not sure if this is good or bad).
So, it feels to me ideally within the same classroom there should be a natural way to work on your own pace at your own level. Is it possible? Have no idea - seems not, again primarily because it requires a completely different skillset and attention from teachers.
> should be a natural way to work on your own pace at your own level
Analogous to the old one-room-school model where one teacher taught all grade levels and students generally worked from textbooks. There were issues with it stemming from specialization (e.g., teaching 1st grade is different than teaching 12th). They were also largely in rural areas and generally had poor facilities.
The main barrier in the US to track separation is manpower. Public School teachers are underpaid and treated like shit, and schools don't get enough funding which further reduces the number of teachers.
Teachers just don't have the time in the US to do multiple tracks in the classroom.
You can have a multi-track high-school system, like in much of Europe. Some are geared towards the academically inclined who expect to go to university, others hold that option open but focus on also learning a trade or specialty (this can be stuff like welding, CNC, or hospitality industry / restaurants etc.), while others focus more heavily on the trade side, with apprenticeship at companies intertwined with the education throughout high school, and switching to a university after that is not possible by default, but not ruled out if you put in some extra time).
Or you can also have stronger or weaker schools where the admission test scores required are different, so stronger students go to different schools. Not sure if that's a thing in the US.
This was the way all schools worked in my county in florida, at least from middle school on. Normal/Honors/AP split is what pretty much every highschool did at the time. You could even go to a local community college instead of HS classes.
> Also if there is an explicit split there will be groups of people who "game" it (spend disproportional amount of time to "train" their kids vs actual natural talent - not sure if this is good or bad).
The idea of tracking out kids who excel due to high personal motivation when they have less natural aptitude is flat out dystopian. I'm drawing mental images of Gattaca. Training isn't "gaming". It's a natural part of how you improve performance, and it's a desirable ethical attribute.
To be fair. I live at Mission Bay (SF) that has Caltrain railway nearby (and you have to cross it if you take particular ways in/out). I drive (and like it a lot!) Waymo. Waymo avoids crossing it (it takes a longer way to drive a bridge). So, they probably realized the risk and still to this are not willing taking it.
It compiles human prompt into some intermediate code (in this case Python). Probably initial version of CPython was not perfect at all, and engineers were also terrified. If we are lucky this new "compiler" will be becoming better and better, more efficient. Never perfect, but people will be paying the same price they are already paying for not dealing directly with ASM.
Something that you neglected to mention is, with every abstraction layer up to Python, everything is predictable and repeatable. With LLMs, we can give the exact same instructions, and not be guaranteed the same code.
I’m not sure why that matters here. Users want code that solves their business need. In general most don’t care about repeatability if someone else tries to solve their problem.
The question that matters is: can businesses solve their problems cheaper for the same quality, or at lower quality while beating the previous Pareto-optimal cost/quality frontier.
Sure. You seem to think that LLMs will be unable to identify abstraction opportunities if the code is not identical; that’s not obvious to me. Indeed there are some good (but not certain) reasons to think LLMs will be better at broad-not-deep stuff like “load codebase into context window and spot conceptual repetition”. Though I think the creative insight of figuring out what kind of abstraction is needed may be the spark that remains human for a while.
Also, maybe recognizing the repetition remains the human's job, but refactoring is exponentially easier and so again we get better code as a result.
Seems to me to be pretty early to be making confident predictions about how this is all going to pan out.
> The question that matters is: will businesses crumble due to overproduction of same (or lower) quality code sooner or later.
but why doesn't that happen today? Cheap code can be had by hiring in cheap locations (outsourced for example).
The reality is that customers are the ultimate arbiters, and if it satisfies them, the business will not collapse. And i have not seen a single customer demonstrate that they care about the quality of the code base behind the product they enjoy paying for.
> And i have not seen a single customer demonstrate that they care about the quality of the code base behind the product they enjoy paying for.
The code quality translates to speed of introduction of changes, fixes of defects and amount of user-facing defects.
While customers may not express any care about code quality directly they can and will express (dis)satisfaction with performance and defects of the product.
It happens today. However, companies fail for multiple problems that come together. Bad software quality (from whatever source) is typically not a very visible one among them because when business people take over, they only see (at most) that software development/maintenance cost more money that it could yield.
It is happening. There is a lot of bad software out there. Terrible to use, but still functional enough that it keeps selling. The question is how much crap you can pile on top of that already bad code before it falls apart.
> Cheap code can be had by hiring in cheap locations (outsourced for example).
If you outsource and like what you get, you would assume the place you outsourced to can help provide continued support. What assurance do you have with LLMs? A working solution doesn't mean it can be easily maintained and/or evolved.
> And i have not seen a single customer demonstrate that they care about the quality of the code base behind the product they enjoy paying for.
That is true, but they will complain if bugs cannot be fixed and features are added. It is true that customers don't care, and they shouldn't, until it does matter, of course.
The challenge with software development isn't necessarily with the first iteration, but rather it is with continued support. Where I think LLMs can really shine is in providing domain experts (those who understand the problem) with a better way to demonstrate their needs.
... which is the whole idea behind training, isn't it?
The question that matters is: will businesses crumble due to overproduction of same (or lower) quality code sooner or later.
The problem is really the opposite -- most programmers are employed to create very minor variations on work done either by other programmers elsewhere, by other programmers in the same organization, or by their own younger selves. The resulting inefficiency is massive in human terms, not just in managerial metrics. Smart people are wasting their lives on pointlessly repetitive work.
When it comes to the art of computer programming, there are more painters than there are paintings to create. That's why a genuinely-new paradigm is so important, and so overdue... and it's why I get so frustrated when supposed "hackers" stand in the way.
>> Recognizable repetition can be abstracted
> ... which is the whole idea behind training, isn't it?
The comment I was answering specifically dismissed LLM's inability to answer same question with same... answer as unimportant. My point is that this ability is crucial to software engineering - answers to similar problems should be as similar as possible.
Also, I bet that LLM's are not trained to abstract. In my experience they lately are trained to engage users in pointless dialogue as long as possible.
Unfortunately, this is only deterministic on the same hardware, but there is no reason why one couldn't write reasonably efficient LLM kernels. It just has not been a priority.
Nevertheless, I still agree with the main point that it is difficult to get LLMs to produce the same output reliably. A small change in the context might trigger all kinds of changes in the generated code.
There is no reason to assume that say C compiler generates the same machine code for the same source code. AFAIK, a C compiler that chooses randomly between multiple C-semantically equivalent sequences of instructions is a valid C compiler.
With LLMs, we can give the exact same instructions, and not be guaranteed the same code.
That's something we'll have to give up and get over.
See also: understanding how the underlying code actually works. You don't need to know assembly to use a high-level programming language (although it certainly doesn't hurt), and you won't need to know a high-level programming language to write the functional specs in English that the code generator model uses.
I say bring it on. 50+ years was long enough to keep doing things the same way.
Even compiling code isn't deterministic given different compilers and different items installed on a machine can influence the final resulting code, right? Ideally they shouldn't have any noticeable impact, but in edge cases it might, which is why you compile your code once during a build step and then deploy the same compiled code to different environments instead of compiling it per environment.
No, it is much more involved and not all providers allow the necessary tweakings. This means you will need to use local models (with hardware caveats) which will require us to ask:
- Are local models good enough?
- What are we giving up for deterministic behaviour?
For example, will it be much more difficult to write prompts. Will the output be nonsensical and more.
> assuming you have full control over which compiler youre using for each step ;)
With existing tools, we know if we need to do something, we can. The issue with LLMs, is they are very much black boxes.
> What's to say LLMs will not have a "compiler" interface in the future that will reign in their variance
Honestly, having a compiler interface for LLMs isn't a bad idea...for some use cases. What I don't see us being able to do is use natural language to build complex apps in a deterministic manner. Solving this problem would require turning LLMs into deterministic machines, which I don't believe will be an easy task, given how LLMs work today.
I'm a strong believer in that LLMs will change how we develop and create software development tools. In the past, you would need Google and Microsoft level of funding to integrate natural language into a tool, but with LLMs, we can easily have LLMs parse input and have it map to deterministic functions in days.
It may be a “level of abstraction”, but not a good one, because it is imprecise.
When you want to make changes to the code (which is what we spend most of our time on), you’ll have to either (1) modify the prompt and accept the risk of using the new code or (2) modify the original code, which you can’t do unless you know the lower level of abstraction.
I think there is a big difference between an abstraction layer that can improve -- one where you maybe write "code" in prompts and then have a compiler build through real code, allowing that compiler to get better over time -- and an interactive tool that locks bad decisions autocompleted today into both your codebase and your brain, involving you still working at the lower layer but getting low quality "help" in your editor. I am totally pro- compilers and high-level languages, but I think the idea of writing assembly with the help of a partial compiler where you kind of write stuff and then copy/paste the result into your assembly file with some munging to fix issues is dumb.
By all means, though: if someone gets us to the point where the "code" I am checking in is a bunch of English -- for which I will likely need a law degree in addition to an engineering background to not get evil genie with a cursed paw results from it trying to figure out what I must have meant from what I said :/ -- I will think that's pretty cool and will actually be a new layer of abstraction in the same class as compiler... and like, if at that point I don't use it, it will only be because I think it is somehow dangerous to humanity itself (and even then I will admit that it is probably more effective)... but we aren't there yet and "we're on the way there" doesn't count anywhere near as much as people often want it to ;P.
Miles Cole here: I’d love to see Daft on Ray become more widely used. Same Dataframe API and run it in either single or multi-machine mode. The only thing I don’t love about it today is that their marketing is a bit misleading. Daft is distributed VIA Ray, Daft itself is not distributed.
Thanks for the feedback on marketing! Daft is indeed distributed using Ray, but to do so involves Daft being architected very carefully for distributed computing (e.g. using map/reduce paradigms).
Ray fulfills almost a Kubernetes-like role for us in terms of orchestration/scheduling (admittedly it does quite a bit more as well especially in the area of data movement). But yes the technologies are very complementary!
The idea is that it doesn't store binary files locally, just pointers in the DB + meta data (SQLite if you run locally, open source). So, it's versioning, structuring of datasets, etc by "references" if you wish.
(that's is different from let's say DVC - that does copy files into a local cache, always)
So in the case from the README, where you're trying to curate a sample of your data, the only thing that you're reading is the metadata, UNTIL you run `export_files` and that actually copies the binary data to your local machine?
Exactly! DataChain does lazy compute. It will read metadata/json while applying filtering and only download a sample of data files (jpg) based on the filter.
This way, you might end up downloading just 1% of your data, as defined by the metadata filter.