> Why would you think its more complex? Binary code takes more space, and both t...

> Why would you think its more complex?

Binary code takes more space, and both training and inference is highly capped by memory and context sizes.

Models tokenize to a limited set of tokens, and then learn relations between those. I can't say for sure, but I feel it be more challenging to find tokenization schemes for binary code and learn their relationships.

The model needs to first learn human language really well, because it has to understand the prompt and map it accurately to the binary code. That means the corpus will need to include a lot of human languages that it learns and also binary code, I wonder if the fact they differ so much would conflict the learning.

I think coming up with a corpus of mapped human language to binary code will be really challenging. Unless we can include the original code's comments at appropriate places around the binary code and so on.

Binary code is machine dependent, so it would result in programs that aren't portable between architecture and operating system and so on. The model would need to learn more than one binary code and be able to accurately generate the same program for different target platforms and OS.

> Who said that creating bits efficiently from English to be computed by CPUs or GPUs must be done with transformer architecture?

We've never had any other method ever do as well and by a magnitude. We may invent a whole new way in the future, but as of now, it's the absolute best method we've ever figured out.

> The AI model architecture is not the focus of the discussion. It is the possibilities of how it can look like if we ask for some computation, and that computation appears without all the middle-men layers we have right now, English->Model->Computation, not English->Model->DSL->Compiler->Linker->Computation.

Each layer simplifies the task of the layer above. These aren't like business layer that take a cut of the value out at each level, software layers remove complexity from the layers above.

I don't know why we wouldn't be talking about AI models? Isn't the topic that it may be more optimal for an AI model to be trained on binary code directly and to generate binary code directly? At least it's what I was talking about.

So if I stick to AI models. With LLMs and image/video diffusion and such, we've already observed that inference through smaller steps and chains of inference work way better. Based on that, I feel it's likely going from human language to binary code in a single hop to also work worse.