Hacker Newsnew | past | comments | ask | show | jobs | submit | norir's commentslogin

From what I can tell chibicc, unlike tcc, is not a complete c compiler in and of itself. Looking at its source code, it relies upon external tools for both x86_64 code gen and linking: https://github.com/rui314/chibicc/blob/90d1f7f199cc55b13c7fd...

It does rely on binutils, but by this standard GCC is not a complete C compiler either.

Relying on binutils and an assembler is fine. I don't think it really affects the internal complexity of the compiler much, but having textual assembly to look at can be handy for debugging the compiler, so it might reduce the human effort to get it working.

That's a terrible benchmark but the correct thing to do is not to eliminate the code but to issue an error or warning that this is a cpu cycle burning no-op.

The more high level code is used, the more purposeless code is there to be eliminated. For example, an RAII object with an empty default destructor.

That's hard to implement, because typically, constructs like this will be the result of various previous passes (macro expansion, inlining, dead code elimination,...), typically it's not written by the user directly.

That would make for a bad experience in the presence of macros or other compile-time configuration.

It's pretty common to have code that only exists in one configuration but not others. In those, you end up with some obviously pointless code that the compiler should silently discard. It would be no fun if you couldn't compile your release build because the compiler yelled at you that the removed `assert()`s turned some of the surrounding code into dead code.


Where do you draw the line on this logic? Should we never inline or unroll and only warn? Should we never re-order and only warn? The CPU is doings its own re-ordering and that can't be manually controlled, should we pick instructions to force ordering and warn?

I can understand if we'd want to emit perfectly matching control flow logic with certain debug flags to verify correctness in specific scenarios. But I'd want it to be opt-in.


Using LLVM is an indirect approach that will limit the quality of your compiler.

When one looks at languages that use LLVM as a backend, there is one consistent property: slow compilation. Because of how widespread LLVM is, we often seem to accept this as a fact of life and that we are forced to make a choice between fast runtime code and a fast compiler. This is a false choice.

Look at two somewhat recent languages that use LLVM as a backend: zig and rust. The former has acknowledged that LLVM is an albatross and are in the process of writing their own backends to escape its limitations. The latter is burdened with ridiculous compilation times that will never get meaningfully better so long as they avoid writing their own backend.

Personally, I find LLVM a quite disempowering technology. It creates the impression that its complexity is necessary for quality and performance and makes people dependent on it instead of developing their own skills. This is not entirely dissimilar to another hot technology with almost the same initials.


I don't want an optimizer that eliminates an unnecessary operation. I want a compiler that tells me that it is unnecessary so I can remove it.

This is operating on IR, not on lines of code. Figuring out where the operation came from is extremely difficult because you have to propagate all of that info back and forth across the passes, and it may end up being split "across" syntactic elements. If your language has any form of metaprogramming or code reuse (i.e. all of them), that operation may also be necessary at some use sites and not at others, among other issues.

This kind of compiler/runtime feedback about source code is really interesting and (imo) under-studied. Especially when you take into account something like PGO data.

This is not always possible. Consider the monomorphized output of a generic function. An operation may be dead in one instance but not generally

That kind of feedback is also possible within this framework in theory. It depends on at what level the abstract interpreter is operating. If it’s the source level then it’s easy, but propagating that from an IR to source code is, shall we say, an open question.

For me, compiler optimization is a mixed bag. On the one hand, they can facilitate the generation of higher performance runtime artifacts, but it comes at significant cost, often I believe exceeding the value they provide. They push programs in the direction of complexity and inscrutability. They make it harder to know what a function _actually_ does, and some even have the ability to break your code.

In the OP examples, instead of optimization, what I would prefer is a separate analysis tool that reports what optimizations are possible and a compiler that makes it easy to write both high level and machine code as necessary. Now instead of the compiler opaquely rewriting your code for you, it helps guide you into writing optimal code at the source level. This, for me, leads to a better equilibrium where you are able to express your intent at a high level and then, as needed, you can perform lower level optimizations in a transparent and deterministic way.

For me, the big value of existing optimizing compilers is that I can use them to figure out what instructions might be optimal for my use case and then I can directly write those instructions where the highest performance is needed. But I do not need to subject myself to the slow compilation times (which compounds as the compiler repeatedly reoptimizes the same function thousands of times during development -- a cost that is repeated with every single compilation of the file) nor the possibility that the optimizer breaks my code in an opaque way that I won't notice until something bad and inscrutable happens at runtime.


Master recursion and there will be nothing left to master. Avoid recursion and you will remain forever stuck in a loop.


The prominence of LLVM is a symptom of the dying of compiler writing as an art, not evidence of its vitality.


> compiler writing as an art

cooking is an art. software is engineering. no one would say "building skyscrapers as an art is dying".


Parsing is strange in that many people tend to believe it is a solved problem and yet every project handles it slightly differently (and almost none do it truly well).

I have been studying compiler design for several years and I have found that writing a simple parser by hand is the best way to go most of the time. There is a process to it: You start with a "Hello, world!" program and you parse it character by character with no separate lexer. You ensure that at each step in your parser, you make an unambiguous decision at each character and never backtrack. The decision may be that you need to enter a disambiguation function that also only moves forward. If the grammar gets in the way of conserving this property, change the grammar not the parser design.

If you follow that high level algorithm, you will end up with a parser with performance linear in the number of characters which is asymptotically as well as you can hope to do. It is both easy and simple to implement (provided you have solid programming fundamentals) and no caching is needed for efficiency.

Deliberate backtracking in a compiler is an evil that should be avoided at all costs. It is potentially injecting exponentially poor performance into a developer's primary feedback loop which is a theft of their time for which they have little to no recourse.


I agree, that if you want to write a production grade parser, this is probably the best way to go. I also agree that parsing is not a solved problem for all cases. But that is the case with many more problems. However, for many cases it is a solved problem and that often it is not the first thing you should focus on to optimize.

If you teach a course about compiler construction, I think it might be better to teach your students how to write a grammar for some language and use some interactive parser that can parse some input according to the grammar (and visualize the AST). See for example: [1] and [2] (Even if you feed it the C grammar, it succeeds parsing thousands of lines (preprocessed) C code at every keystroke. This interpreting parser is written in JavaScript and uses a simple caching strategy for performance improvement.)

For the scripting language [3] in some of the Bizzdesigns modeling tools, a similar interactive parser was used (implemented in C++). This scripting language is also internally used for implementing the various meta-models. These scripts are parsed once, cached, and interpreted often.

I think it is also true for many domain-specific languages (DSL).

[1] https://info.itemis.com/demo/agl/editor

[2] https://fransfaase.github.io/MCH2022ParserWorkshop/IParseStu...

[3] https://help.bizzdesign.com/articles/#!horizzon-help/the-scr...


Scala did async io in a very similar way over a decade ago except it was far more ergonomic, in my opinion, because the IO object was implicit. I am not convinced by either scala or zig that it is the best approach.


From my perspective, Zig is trying to do far too many things to ever reach a baseline of goodness that I consider acceptable. They are in my view quite disrespectful to their users who they force to endure churn at the whims of their dictator. Now that enough people have bought in and accepted that a broken tool is ok so long as it hasn't been blessed with a 1.0 all of its clear flaws can be overlooked in the hope of the coming utopia (spoiler alert: that day will never arrive).

Personally, I think it is wrong to inflict your experiments on other people and when you pull the rug out from underneath say, well, we told you it was unstable, you should't have depended on us in the first place.

I don't even understand what zig is supposed to be. Matklad seems to think it is a machine level language: https://lobste.rs/s/ntruuu/lobsters_interview_with_matklad. This contrasts with the official language landing page: Zig is a general-purpose programming language and toolchain for maintaining robust, optimal and reusable software. These two definitions are mutually incompatible. Moreover, zig is clearly not a general purpose language because there are plenty of programming problems where manual memory management is neither needed nor desirable.

All of this confusion is manifest in zig's instability and bloated standard library. Indeed a huge standard library is incompatible with the claims of simplicity and generality they frequently make. Async is not a feature that can be implemented universally without adding overhead and indirection because of the fundamental differences in capabilities exposed by the various platforms. Again, they are promising a silver bullet even though their prior attempt, in which they publicly proclaimed function coloring to be solved, has been abandoned. Why would we trust them to get it right a second time?

There are a very small number of assembly primitives that every platform provides that are necessary to implement a compiler. Load/store/mov/inc/jeq/jump and perhaps a few others. Luajit implements its parser in pure assembly and I am not aware of an important platform that luajit runs on that zig goes. I do the vast majority of my programming in lua and _never_ run into bugs in the interpreter. I truly cannot think of a single problem that I think zig would solve better than luajit. Even if that did exist, I could embed the zig code in my lua file and use lua to drive the zig compiler and then call into the specialized code using the lua ffi. But the vast majority of code does not need to be optimized to the level of machine code where it is worth putting up with all of the other headaches that adopting zig will create.

The hype around zig is truly reaching llm levels of disconnection from reality. Again, to believe in zig, one has to believe it will magically develop capacities that it does not presently have and for which there is no plan to actually execute besides vague plans of just wait.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: