More

shcheklein · 2025-10-09T17:35:45 1760031345

Probably people at some point were making same arguments about ASM and C. How many people though do ASM these days? Not arguing that for now it is relevant point, obviously Rust / C are way faster.

IshKebab · 2025-10-09T17:50:14 1760032214

I doubt it. C is well within 2x of what you can achieve with hand written assembly in almost every case.

Furthermore writing large programs in pure assembly is not really feasible, but writing large programs in C++, Go, Rust, Java, C#, Typescript, etc. is totally feasible.

pjmlp · 2025-10-09T18:09:03 1760033343

1980's and 1990's game development says hi.

C compilers weren't up to stuff, that is why books like those from Michael Abrash do exist.

shcheklein · 2025-10-09T18:14:09 1760033649

Yep, exactly, It takes time to close the gap so that more and more teams taking that tradeoff

theLiminator · 2025-10-09T18:25:27 1760034327

> I doubt it. C is well within 2x of what you can achieve with hand written assembly in almost every case.

Depends what you mean, if you preclude using targeted ASM in your C I think some hot loops can be much slower than 2x.

Of course programs globally written in assembly largely don't make sense.

shcheklein · 2025-08-22T16:53:42 1755881622

This is probably the right solution. It seems in reality nobody does this since it is expensive (more teachers, real attention to students, etc). Also if there is an explicit split there will be groups of people who "game" it (spend disproportional amount of time to "train" their kids vs actual natural talent - not sure if this is good or bad).

So, it feels to me ideally within the same classroom there should be a natural way to work on your own pace at your own level. Is it possible? Have no idea - seems not, again primarily because it requires a completely different skillset and attention from teachers.

StableAlkyne · 2025-08-22T17:22:39 1755883359

> should be a natural way to work on your own pace at your own level

Analogous to the old one-room-school model where one teacher taught all grade levels and students generally worked from textbooks. There were issues with it stemming from specialization (e.g., teaching 1st grade is different than teaching 12th). They were also largely in rural areas and generally had poor facilities.

The main barrier in the US to track separation is manpower. Public School teachers are underpaid and treated like shit, and schools don't get enough funding which further reduces the number of teachers.

Teachers just don't have the time in the US to do multiple tracks in the classroom.

bonoboTP · 2025-08-22T18:01:22 1755885682

You can have a multi-track high-school system, like in much of Europe. Some are geared towards the academically inclined who expect to go to university, others hold that option open but focus on also learning a trade or specialty (this can be stuff like welding, CNC, or hospitality industry / restaurants etc.), while others focus more heavily on the trade side, with apprenticeship at companies intertwined with the education throughout high school, and switching to a university after that is not possible by default, but not ruled out if you put in some extra time).

Or you can also have stronger or weaker schools where the admission test scores required are different, so stronger students go to different schools. Not sure if that's a thing in the US.

BobbyJo · 2025-08-22T17:02:37 1755882157

This was the way all schools worked in my county in florida, at least from middle school on. Normal/Honors/AP split is what pretty much every highschool did at the time. You could even go to a local community college instead of HS classes.

foobazgt · 2025-08-22T19:00:10 1755889210

> Also if there is an explicit split there will be groups of people who "game" it (spend disproportional amount of time to "train" their kids vs actual natural talent - not sure if this is good or bad).

The idea of tracking out kids who excel due to high personal motivation when they have less natural aptitude is flat out dystopian. I'm drawing mental images of Gattaca. Training isn't "gaming". It's a natural part of how you improve performance, and it's a desirable ethical attribute.

shcheklein · 2025-08-22T20:10:58 1755893458

What if its parents "motivation" to a large extent (and by gaming I meant primarily parents pushing extremely hard)? How would you draw the line?

To be clear - I personally don't have an answer to this.

shcheklein · 2025-07-20T16:44:10 1753029850

To be fair. I live at Mission Bay (SF) that has Caltrain railway nearby (and you have to cross it if you take particular ways in/out). I drive (and like it a lot!) Waymo. Waymo avoids crossing it (it takes a longer way to drive a bridge). So, they probably realized the risk and still to this are not willing taking it.

shcheklein · 2025-05-22T21:38:42 1747949922

Congratulations on the launch! Amazing work, it would be great to see how it evolves into a bigger story, work related agents beyond healthcare.

shcheklein · 2025-01-13T02:58:25 1736737105

On the other hand it might become a next level of abstraction.

Machine -> Asm -> C -> Python -> LLM (Human language)

It compiles human prompt into some intermediate code (in this case Python). Probably initial version of CPython was not perfect at all, and engineers were also terrified. If we are lucky this new "compiler" will be becoming better and better, more efficient. Never perfect, but people will be paying the same price they are already paying for not dealing directly with ASM.

sdesol · 2025-01-13T03:48:27 1736740107

> Machine -> Asm -> C -> Python -> LLM (Human language)

Something that you neglected to mention is, with every abstraction layer up to Python, everything is predictable and repeatable. With LLMs, we can give the exact same instructions, and not be guaranteed the same code.

theptip · 2025-01-13T05:29:24 1736746164

I’m not sure why that matters here. Users want code that solves their business need. In general most don’t care about repeatability if someone else tries to solve their problem.

The question that matters is: can businesses solve their problems cheaper for the same quality, or at lower quality while beating the previous Pareto-optimal cost/quality frontier.

thesz · 2025-01-13T06:11:45 1736748705

Recognizable repetition can be abstracted, reducing code base and its (running) support cost.

The question that matters is: will businesses crumble due to overproduction of same (or lower) quality code sooner or later.

theptip · 2025-01-14T05:58:54 1736834334

Sure. You seem to think that LLMs will be unable to identify abstraction opportunities if the code is not identical; that’s not obvious to me. Indeed there are some good (but not certain) reasons to think LLMs will be better at broad-not-deep stuff like “load codebase into context window and spot conceptual repetition”. Though I think the creative insight of figuring out what kind of abstraction is needed may be the spark that remains human for a while.

Also, maybe recognizing the repetition remains the human's job, but refactoring is exponentially easier and so again we get better code as a result.

Seems to me to be pretty early to be making confident predictions about how this is all going to pan out.

chii · 2025-01-13T06:40:24 1736750424

> The question that matters is: will businesses crumble due to overproduction of same (or lower) quality code sooner or later.

but why doesn't that happen today? Cheap code can be had by hiring in cheap locations (outsourced for example).

The reality is that customers are the ultimate arbiters, and if it satisfies them, the business will not collapse. And i have not seen a single customer demonstrate that they care about the quality of the code base behind the product they enjoy paying for.

thesz · 2025-01-13T14:26:54 1736778414

> And i have not seen a single customer demonstrate that they care about the quality of the code base behind the product they enjoy paying for.

The code quality translates to speed of introduction of changes, fixes of defects and amount of user-facing defects.

While customers may not express any care about code quality directly they can and will express (dis)satisfaction with performance and defects of the product.

carschno · 2025-01-13T12:24:29 1736771069

It happens today. However, companies fail for multiple problems that come together. Bad software quality (from whatever source) is typically not a very visible one among them because when business people take over, they only see (at most) that software development/maintenance cost more money that it could yield.

OvbiousError · 2025-01-13T09:56:36 1736762196

It is happening. There is a lot of bad software out there. Terrible to use, but still functional enough that it keeps selling. The question is how much crap you can pile on top of that already bad code before it falls apart.

sdesol · 2025-01-13T07:29:08 1736753348

> Cheap code can be had by hiring in cheap locations (outsourced for example).

If you outsource and like what you get, you would assume the place you outsourced to can help provide continued support. What assurance do you have with LLMs? A working solution doesn't mean it can be easily maintained and/or evolved.

> And i have not seen a single customer demonstrate that they care about the quality of the code base behind the product they enjoy paying for.

That is true, but they will complain if bugs cannot be fixed and features are added. It is true that customers don't care, and they shouldn't, until it does matter, of course.

The challenge with software development isn't necessarily with the first iteration, but rather it is with continued support. Where I think LLMs can really shine is in providing domain experts (those who understand the problem) with a better way to demonstrate their needs.

CamperBob2 · 2025-01-13T18:03:22 1736791402

Recognizable repetition can be abstracted

... which is the whole idea behind training, isn't it?

The question that matters is: will businesses crumble due to overproduction of same (or lower) quality code sooner or later.

The problem is really the opposite -- most programmers are employed to create very minor variations on work done either by other programmers elsewhere, by other programmers in the same organization, or by their own younger selves. The resulting inefficiency is massive in human terms, not just in managerial metrics. Smart people are wasting their lives on pointlessly repetitive work.

When it comes to the art of computer programming, there are more painters than there are paintings to create. That's why a genuinely-new paradigm is so important, and so overdue... and it's why I get so frustrated when supposed "hackers" stand in the way.

thesz · 2025-01-13T21:21:08 1736803268

    >> Recognizable repetition can be abstracted
    > ... which is the whole idea behind training, isn't it?

The comment I was answering specifically dismissed LLM's inability to answer same question with same... answer as unimportant. My point is that this ability is crucial to software engineering - answers to similar problems should be as similar as possible.

Also, I bet that LLM's are not trained to abstract. In my experience they lately are trained to engage users in pointless dialogue as long as possible.

CamperBob2 · 2025-01-13T22:28:56 1736807336

No, only the spec is important. How the software implements the spec is not important in the least. (To the extent that's not true, fix the spec!)

Nor is whether the implementation is the same from one build to the next.

compumetrika · 2025-01-13T13:18:40 1736774320

LLMs use pseudo-random numbers. You can set the seed and get exactly the same output with the same model and input.

blibble · 2025-01-13T17:01:12 1736787672

you won't because floating point arithmetic isn't associative

and the GPU scheduler isn't deterministic

threeducks · 2025-01-13T21:02:02 1736802122

You can set PyTorch to deterministic mode with a small performance penalty: https://pytorch.org/docs/stable/notes/randomness.html#avoidi...

Unfortunately, this is only deterministic on the same hardware, but there is no reason why one couldn't write reasonably efficient LLM kernels. It just has not been a priority.

Nevertheless, I still agree with the main point that it is difficult to get LLMs to produce the same output reliably. A small change in the context might trigger all kinds of changes in the generated code.

zurn · 2025-01-13T08:53:49 1736758429

> > Machine -> Asm -> C -> Python -> LLM (Human language)

> Something that you neglected to mention is, with every abstraction layer up to Python, everything is predictable and repeatable.

As long as you consider C and dragons flying out of your nose predictable.

(Insert similar quip about hardware)

zajio1am · 2025-01-13T16:39:03 1736786343

There is no reason to assume that say C compiler generates the same machine code for the same source code. AFAIK, a C compiler that chooses randomly between multiple C-semantically equivalent sequences of instructions is a valid C compiler.

CamperBob2 · 2025-01-13T07:23:56 1736753036

With LLMs, we can give the exact same instructions, and not be guaranteed the same code.

That's something we'll have to give up and get over.

See also: understanding how the underlying code actually works. You don't need to know assembly to use a high-level programming language (although it certainly doesn't hurt), and you won't need to know a high-level programming language to write the functional specs in English that the code generator model uses.

I say bring it on. 50+ years was long enough to keep doing things the same way.

SkyBelow · 2025-01-13T18:34:08 1736793248

Even compiling code isn't deterministic given different compilers and different items installed on a machine can influence the final resulting code, right? Ideally they shouldn't have any noticeable impact, but in edge cases it might, which is why you compile your code once during a build step and then deploy the same compiled code to different environments instead of compiling it per environment.

jsjohnst · 2025-01-13T12:57:03 1736773023

> With LLMs, we can give the exact same instructions, and not be guaranteed the same code.

Set temperature appropriately, that problem is then solved, no?

sdesol · 2025-01-13T17:03:27 1736787807

No, it is much more involved and not all providers allow the necessary tweakings. This means you will need to use local models (with hardware caveats) which will require us to ask:

- Are local models good enough?

- What are we giving up for deterministic behaviour?

For example, will it be much more difficult to write prompts. Will the output be nonsensical and more.

12345hn6789 · 2025-01-13T04:43:17 1736743397

assuming you have full control over which compiler youre using for each step ;)

What's to say LLMs will not have a "compiler" interface in the future that will reign in their variance

sdesol · 2025-01-13T05:10:51 1736745051

> assuming you have full control over which compiler youre using for each step ;)

With existing tools, we know if we need to do something, we can. The issue with LLMs, is they are very much black boxes.

> What's to say LLMs will not have a "compiler" interface in the future that will reign in their variance

Honestly, having a compiler interface for LLMs isn't a bad idea...for some use cases. What I don't see us being able to do is use natural language to build complex apps in a deterministic manner. Solving this problem would require turning LLMs into deterministic machines, which I don't believe will be an easy task, given how LLMs work today.

I'm a strong believer in that LLMs will change how we develop and create software development tools. In the past, you would need Google and Microsoft level of funding to integrate natural language into a tool, but with LLMs, we can easily have LLMs parse input and have it map to deterministic functions in days.

omgwtfbyobbq · 2025-01-13T08:21:28 1736756488

Aren't some models deterministic with temperature set to 0?

vages · 2025-01-13T04:42:39 1736743359

It may be a “level of abstraction”, but not a good one, because it is imprecise.

When you want to make changes to the code (which is what we spend most of our time on), you’ll have to either (1) modify the prompt and accept the risk of using the new code or (2) modify the original code, which you can’t do unless you know the lower level of abstraction.

Recommended reading: https://ian-cooper.writeas.com/is-ai-a-silver-bullet

MVissers · 2025-01-13T03:16:47 1736738207

Yup!

No goal to become a programmer– But I like to build programs.

Build a rather complex AI-ecosystem simulator with me as the director and GPT-4 now Claude 3.5 as the programmer.

Would never have been able to do this beforehand.

saurik · 2025-01-13T03:18:46 1736738326

I think there is a big difference between an abstraction layer that can improve -- one where you maybe write "code" in prompts and then have a compiler build through real code, allowing that compiler to get better over time -- and an interactive tool that locks bad decisions autocompleted today into both your codebase and your brain, involving you still working at the lower layer but getting low quality "help" in your editor. I am totally pro- compilers and high-level languages, but I think the idea of writing assembly with the help of a partial compiler where you kind of write stuff and then copy/paste the result into your assembly file with some munging to fix issues is dumb.

By all means, though: if someone gets us to the point where the "code" I am checking in is a bunch of English -- for which I will likely need a law degree in addition to an engineering background to not get evil genie with a cursed paw results from it trying to figure out what I must have meant from what I said :/ -- I will think that's pretty cool and will actually be a new layer of abstraction in the same class as compiler... and like, if at that point I don't use it, it will only be because I think it is somehow dangerous to humanity itself (and even then I will admit that it is probably more effective)... but we aren't there yet and "we're on the way there" doesn't count anywhere near as much as people often want it to ;P.

shcheklein · 2024-12-15T19:17:39 1734290259

Another alternative to consider is https://www.getdaft.io/ . AFAIU it is a more direct competitor to Spark (distributed mode).

mwc360 · 2024-12-16T06:31:10 1734330670

Miles Cole here: I’d love to see Daft on Ray become more widely used. Same Dataframe API and run it in either single or multi-machine mode. The only thing I don’t love about it today is that their marketing is a bit misleading. Daft is distributed VIA Ray, Daft itself is not distributed.

jaychia · 2024-12-16T23:01:32 1734390092

Hey, I'm one of the developers of Daft :)

Thanks for the feedback on marketing! Daft is indeed distributed using Ray, but to do so involves Daft being architected very carefully for distributed computing (e.g. using map/reduce paradigms).

Ray fulfills almost a Kubernetes-like role for us in terms of orchestration/scheduling (admittedly it does quite a bit more as well especially in the area of data movement). But yes the technologies are very complementary!

shcheklein · on Nov 5, 2024

thanks, fixed

shcheklein · on Nov 4, 2024

The idea is that it doesn't store binary files locally, just pointers in the DB + meta data (SQLite if you run locally, open source). So, it's versioning, structuring of datasets, etc by "references" if you wish.

(that's is different from let's say DVC - that does copy files into a local cache, always)

aduffy · on Nov 4, 2024

So in the case from the README, where you're trying to curate a sample of your data, the only thing that you're reading is the metadata, UNTIL you run `export_files` and that actually copies the binary data to your local machine?

dmpetrov · on Nov 4, 2024

Exactly! DataChain does lazy compute. It will read metadata/json while applying filtering and only download a sample of data files (jpg) based on the filter.

This way, you might end up downloading just 1% of your data, as defined by the metadata filter.

shcheklein · on Sept 27, 2024

> This _shared_ filesystem is mounted on all Worker and Login nodes and contains another root directory, possibly of a different Linux distribution.

Is it possible to mount this shared filesystem on the root directory of the container? Then this slurm plugin isn’t needed.

scaryramm · on Sept 30, 2024

what on earth would it do if mounted in the root?

shcheklein · on Aug 24, 2024

An out-of-memory dataframe to wrangle unstructured data at scale - https://github.com/iterative/datachain