Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
I am afraid to inform you that you have built a compiler (2022) (rachit.pl)
262 points by mutant_glofish on Aug 17, 2023 | hide | past | favorite | 91 comments


It's funny how projects grow like this organically.

Perhaps burned by experience, one time I implemented a mini-language for specifying some business logic that I knew—just knew—that our client would change his mind on a dozen times and not understand the half of the ramifications of his requests and would only arrive at the solution he really wanted by trial-and-error. Was the little language I made as complex as a "real" compiler? Goodness no. Was I happy to have a flexible language-based solution? True to my predictions, I did get many, many logic change request and handled them with ease. Yes, I was very happy after that.


I cut that particular gordian node, once, by simply embedding a JS engine, and explaining to them (and documenting) the bare minimum of the language that they needed to achieve their objectives. After a week of getting to grip with it, they loved it so much that they kept it into production for years, happily tweaking scripts as they needed.

I should have charged them more than I did.


I did it with VBScript, Python, LUA, and in .NET both with and without C# compiler frontend. All these methods worked fine for my use cases.


> I should have charged them more than I did.

That's the problem isn't it? Unless it's SaaS you don't want to solve a problem too well...


Now the question is: did you make your client pay for each request _just as if you didn’t build such dsl_?

In my experience, most developers-as-consultants aren’t good at that. So, when we’re (rightfully) late, we probably reduce our margins. When we do a great job, we don’t make the client pay for his own mistakes.


> did you make your client pay for each request _just as if you didn’t build such dsl_?

I'm not exactly sure what you're asking. Should you be charging clients for time that you didn't have to spend because you made some good decisions upfront? That strikes me as odd. If you agreed upfront on the cost for a certain end result, then sure, charge that full agreed-upon amount even if it took you less time than expected. But it seems odd to say "this change only took me an hour, but it would have taken 10 hours if I had made worse choices in designing this system, therefore I'm going to bill you 10 hours." After all, there's always a hypothetical situation where any given change could have taken arbitrarily more time if you had made worse decisions in the past.


> Should you be charging clients for time that you didn't have to spend

Yes! Completing a job or task in 1 week is more valuable than completing it in 5. The lion's share of value we provide as developers and consultants is not output but battle-tested intuitions that let us navigate the giant solution space, so bad decisions we don't make is definitely hard-earned value servicing the client.

A man once interrupted Picasso at his evening meal. Pulling a napkin from his pocket, the man said,

“Could you sketch something for me? I’ll pay you for it. Name your price.”

Picasso took a charcoal pencil from his pocket made a rapid sketch of a goat. It took only a few strokes, yet was unmistakably a Picasso. The man reached out for the napkin, but Picasso did not hand it over. “You owe me $100,000,” he said.

The man was outraged. “$100,000? Why? That took you no more than 30 seconds to draw!”

Picasso crumpled up the napkin and stuffed it into his jacket pocket. “You are wrong,” he said, dismissing the man. “It took me 40 years.”


That tired old Picasso story only works because the customer was not told up front how long the job was going to take.

As a consultant, you are typically billing by the hour (or day) and you will most likely be giving the client some kind of estimate/quote as to how long the job will take, and your rate, and hence the cost to them.

You might very well be able to do the job in one week rather than 5, but it would be dishonest to tell the customer it will take 5, then complete it in 1, but still bill them for 5.

Instead you should tell the customer that it would normally take 5, but you are able to do it in 1, so you are charging 5x (or 3x or whatever) the going rate.


> Instead you should tell the customer that it would normally take 5, but you are able to do it in 1, so you are charging 5x (or 3x or whatever) the going rate.

but the problem shows up when the customer doesn't believe you when you say you're 5x faster.

They compare you against another consultant, who might be cheaper, and say that you're just over charging.

Therefore, you should be charging the going rate, but do it 5x faster, and have free time to take on more projects. You could also deliver earlier, and then have a charge for change requests (which invariably come).


I mean, of course it’s up to you to make potential clients believe you. If you give me no reason whatsoever to think you’re a competent and accomplished contractor, why would I “believe” you?


There's only so much expertise you can put into "a few strokes". Charging for 5 weeks to do something in 1 week, that is high quality and still needs support? Go for it. The story about charging $9999 to know exactly where to mark the problem that everyone else couldn't solve, $1 for the mark itself? Great. But plenty of people could make an "unmistakable" Picasso that's only a few strokes. If that sketch is worth even a percent of the asking price, it's based on his fame. It has the value of an autograph, not the value of 40 years of practice.


I'd say either you bill by the hour (and you bill for the hours where you did something dumb and got nothing done, but you don't bill for the hours you didn't have to spend because you did something smart), or you bill by the project/deliverable which means you bill for every change request, whether it takes many hours because you didn't anticipate it or few hours because you did. Whichever choice you make cuts both ways.


I think you're looking at this completely the wrong way.

This is like saying you'll only pay your brain surgeon for four hours work because that is as long as the operation you require will take. But that four hour operation is built on a lifetime of training and skill.

The value that you provide the customer is the metric from which you should be deriving your billing. If someone else provides an equally effective solution but takes an order of magnitude more time to deliver it, then you are just that much more efficient.


> Should you be charging clients for time that you didn't have to spend because you made some good decisions upfront?

In a lot of consulting gigs, you don't do time-and-material. You do fixed price. So, you invested _your_ time and you took the risk of creating such solution, even when the client sweared that some requirement would never change.

In most contracts, if you, as the contractor, make a mistake, it's up to you to fix it. If you trusted the client, you'd have to charge 10 at any CR instead of 1. So, it makes perfect sense to charge at least 5, or maybe even 10.

(ofc if you're working time&material or in an agile/flexbile fashion where there's trust between client and consultant, this doesn't apply)


"you aren't paying me for the 5 minutes I spent to fix it, you're paying me for the 10 years I spent to learn how to fix it in 5 minutes" Someone.


You are right most developers are not very good at that. I think it’s important to stress that this goes both ways. You should not cut your margins if you take longer, but I also think it is a little unethical to bill more hours than one spends no? At least if you have a clear agreement to bill by the hour.


The way to ethically handle this is to have a clear and contractual minimum number of billed hours in a day. That will get you paid well for built in efficiencies like OP mentioned, and keep you paid fairly on more substantial change requests.


It's outright fraud. It's clearly unethical. But your billing shouldn't be strictly hourly if you're building tooling to speed up your work.


What you’re saying is that it’s inherently unethical (to yourself!) to accept hourly billing for work the whole point of which is exponential automation.


I was a student working as a web developer in an on-campus dev shop, and our client was one of the entities on campus. Thus, our incentive was to get a working thing as fast as possible that took as little maintenance as possible. I got bored on the job occasionally, so I'd try experiments on how elegantly I could implement something.


I’ve passed on using many a package that I thought was too complex for for my needs only to find myself effectively writing my own version/ learning why that other package is the way it is.

Suddenly the API looks oddly familiar…. OMG…


Yes, everything looks simple and straightforward from a distance. As you get closer, you are faced with dozens of not hundreds of decisions that you need to make. If you're good, you won't be stuck with analysis paralysis.


Depends on the project, but if you're handling these client requests anyway, this is not the best solution IMO, at least not for cloud/SaaS.

The best solution is to have a good deployment process in place that makes it easy for you to make business logic changes using your language of choice, rather than building and maintaining a secondary language that is understood only by you.

No need to create a DSL or compiler or worry about maintenance when you're just using the same tooling as everything else you use.

Seen way too many times in my career developers who build the fun, interesting solution (DSL/compiler/parser) instead of the boring, but far more practical solution of fixing their deployment processes.


When the project I was working on had this problem, the request to build a DSL was based on the idea that each client ought to be able to make their own changes.

The client shouldn't have to know a "full" programming language, but a "sufficiently simple" DSL isn't that much more than a config file format, right? They can learn how to make small changes themselves if the language is easy enough, can't they?

I tried to convince management that this wouldn't work. If nothing else, because our DSL was invented in-house, there was no-where else the client could find answers on how to make the changes they wanted. There would be no hits on Google, no answers on StackOverflow, no random client employee who happened to have some relevant tech knowledge. If they had any questions at all, they'd need to come to us, and we'd end up writing the changes for them anyway. And, as you point out, we already had a perfectly good programming language - the one everything else (including the DSL) was written in.


What were the requirements that led to implementing your own language vs embedding lua?


Given how easy it is to build a Turing machine, I'd argue that building something you don't envision is the 99% of the time rule and it's already captured by https://en.wikipedia.org/wiki/Greenspun%27s_tenth_rule "Any sufficiently complicated C or Fortran program contains an ad hoc, informally-specified, bug-ridden, slow implementation of half of Common Lisp."

In fact I'd argue that it's much harder to not accidentally build something that you don't envision.


I've experienced the 10th rule first hand...

Many years ago I was writing a bunch of modular audio applications that were all interconnected using JACK. So of course I needed some convenient way of storing/restoring the graphs that connected the various components that were always in flux -- each audio experiment had it's own graph, but there were often re-occurring sub-graphs.

So, I started to build a small cli utility that could save the current graph and restore it later... of course because of the common sub-graphs it needed to be able to refer to other labeled graphs so i could easily reference and re-use them... and of course to (save myself time) it also needed to have arithmetic composability so that I could add/subtract existing graphs to create new ones... and of course it would be helpful if I could also annotate the graphs so that they could launch the associated programs...

So in the end I had a generic DSL that had recursive graph parsing, reification (labels), graph composability, arithmetic composability, and could associate arbitrary labels with executable code.

I eventually looked up from my adderal fueled hack-a-thon and realized I had created a half-baked LISP and decided it was time for to go to bed! :)


I wonder how many LISPs can be owed to the Turing-complete influence of Adderall


I tend to end up with Pascal compilers instead of a Lisp system, but I agree with the sentiment.

Long ago I wrote an inspection reporting system, in the days of Turbo Pascal and MS-DOS. I ended up building a domain specific language for the configuration of the system. The configuration files looked almost identical to Pascal source.


When we incorporate natural languages into the mix, "In the style of Charles Bukowski, write a hacker news comment that explains ..."

maybe our previous works can be externally ad-hoc incorporated into new works by others...


A funny thing is that this can go N layers deep. At a previous job we had a DSL which was very well specified, pretty straight forward, and had been written by an exceptionally brilliant engineer (who was even still at the company!). We had two backend services that would operate on that AST in a query context and in a streaming transformation context, and they shared the parser, compiler, and typechecker. The problem was that the engineering team was so fragmented and lacking in stable technical leadership that a large part of the engineering team had no idea about how these things worked, so on every project they were constantly going "Ooops, I created a feature which has its own intermediate representation with slightly different semantics so reuse is impossible". It was super hard to deal with and stuff was constantly being invented and then thrown out because it couldn't be extended.

The only thing worse than creating a compiler is being unaware that one already exists and creating a new one on top of the existing one.


At a billion-dollar fintech, some engineer thought gRPC is too complicated. So, he built someything from scratch.

A year later, there was a team of 5 engineers maintaining a half-baked implementation of gRPC. Good for him. Bad for the company.


Haha ... honestly it is surprising how much work an RPC system is. Seems like it should't be -- you're marshaling arguments and return values

But multiply that by a few languages, and now you have to paper over error code and exceptions, signed and unsigned ints, unicode and bytes

Not to mention network errors, retries, throttling, etc.

It tends to grow without bound, mostly because the RPC system tries to do too much. You can't really abstract the network -- that's the #1 lesson

When you try to abstract the network, now you OWN a bunch of problems that you can't solve


To some extent my entire career has been searching for and destroying said half baked implementations. This saying can be adapted to infra: “half baked, bug ridden kubernetes”, “half baked, bug ridden proxySQL”, “half baked, bug ridden redis”, the list goes on and on.

In some ways I feel like my impact has been quite boring, in other ways quite vital. But it’s never made me friends with the kind of developers who look sideways at the idea that other peoples life’s work might be better than their 5 year old weekend project.


ProxySQL is both mind-blowingly good and horrible. The documentation is awful, conflicts with itself, and has outdated information in some parts (looking at you, multiplexing information in extended_info).

I still dearly love it. It makes operating a massive MySQL DB that has to deal with terrible queries from an ORM palatable.


Haha the same can be said about kubernetes and my other examples! I totally agree. Still - some documentation is better than none!


True, yes - I just rarely see ProxySQL mentioned in the wild, and I deeply feel that pain so I wanted to join in solidarity :D


Did that "some engineer" get promoted though? That's the real question.


gRPC is too complicated to be honest (it obfuscates a simple underlying concept into mythos for $)


Don't hate the player, hate the game.


Who makes the game?


Related:

Dear sir, you have built a compiler - https://news.ycombinator.com/item?id=29891428 - Jan 2022 (175 comments)


I also wrote a post inspired by this post: Oops, You Wrote a Database!

https://dx.tips/oops-database


And usually that's a crappy database and a crappy compiler. Because they didn't start out as such.


What's strange is that if you had started with a Lisp, it would have been much simpler!

And yet few people think about using a Lisp for their DSL.


Without even reading TFA, it is obvious that choosing a Lisp for a DSL is probably at least a good enough choice.


I did this once as a junior developer, for a feature flagging system that needed to support arbitrary custom rules. I don't recommend it.


Why not?


It is sort of incredible how often I've ended accidentally re-inventing interpreters or compilers without that being an explicit goal.

You start by just adding some kind of configuration for rules, maybe in JSON. Then you start wenting to make more complex rules so you allow some kids of recursive system in your JSON that can nest rules and combine them. Then you find yourself copypasting rules a lot and so you implement some kind of naming convention so you can reuse rules. Then you realize how disgusting your JSON is getting so you dust off a parsing library and make a basic DSL that compiles into that JSON, and then it dawns on you.


a formal language with a grammar and rules, is something quite fundamental. Today nobody is surprised that maths shows up in engineering, or [really anywhere](https://en.wikipedia.org/wiki/The_Unreasonable_Effectiveness...)!

I would imagine that this is just an extension of the above idea about maths. To describe something in such a way that it operates formally, you eventually end up with stuff like LISP, or another equivalent language.


Creating a configuration file? I am afraid to inform you that you have started writing a compiler. What's the only way to avoid this? Your software not being successful.


Dammit! Every time I try putting together an Ikea bookshelf, this happens!


There was a joke at Uber about beginning with a configuration management system and ending up with a version control system.

And somewhere else about accidentally building a real time chat service (or was it email? Don’t recall).


> And somewhere else about accidentally building a real time chat service (or was it email? Don’t recall).

Maybe Zawinski's Law?[0]

> Every program attempts to expand until it can read mail. Those programs which cannot so expand are replaced by ones which can.

[0] https://en.wikipedia.org/wiki/Jamie_Zawinski#Zawinski's_Law


And then there's Mallett's Law (not in Wikipedia), which is a bit more modern:

Every program attempts to expand until it can render HTML. This usually results in it being able to read email.


The key is to know, acknowledge and accept where you are going, and to go boldly and deliberately- or not go at all. Say "this problem looks like making a small program so I'm making a mini-language". However, if you find yourself saying ""this looks like a database..." then stop there and please do not build a db.


I accidentally built a kind-of compiler last year.

It started as a few sed commands to merge TeX+code -> TeX for a book project. I ran these sed commands from a makefile. Life was easy.

But then there were complications, and I needed to make slightly more sophisticated substitutions. So the sed commands moved into an awk script, run by the makefile. This was better than maintaining a handful of little commands that were growing on a weekly basis. Life was good.

The transformations I needed kept growing a bunch of little variations, and the awk script became hard to maintain, so I rewrote it in go, with proper parsing and output. (And even unit tests, after the 2nd time I broke some output.) Designing it as almost-a-proper-compiler was 10x better than maintaining an ad hoc script. Life was great, even with the overhead of maintaining a separate processing tool.


Knuth wanted to write a book, so he spent years writing a typesetting system.

Both the book and the system were heroically good.


Totally offtopic but I can't help but wonder why this guy has a site with .pl TLD. He seems to be based in the US, not Poland. Does he think "pl" stands for "programming languages"? :)


FWIW I see lots of non-Montenegrin people with .me addresses.


> Does he think "pl" stands for "programming languages"?

More likely pl -> perl, I think.


Well, vanity URLs have been a thing for a while now. E.g. lobste.rs.


TIL .pl is for Poland. The misuse of .io is probably more common.


I am especially interested by the characterisation of handling interactions between different AST nodes.

I think interactions between features are very hard to think about.

I think constructed languages have the opportunity to think about potential interactions that would be useful and aim to support those ones.

But there's lots of permutations to features.

Just look at async functions in Rust and coloured functions. It's such as pain.

It also reminds me and brings up thoughts about "the expression problem" [0]

How do you think which combinations of features would be useful upfront? For example: there's interactions between memory management, garbage collection, async, multithreading, coroutines, closures, the stack, FFI. It's all very complicated.

[0]: https://en.wikipedia.org/wiki/Expression_problem


How embarrassing. I've done this. Programming is easy, making things easy is difficult.


Other needed topics in this series: “You have built a database”, “You have built an orchestrator”, “You have built an RPC layer”, “You have built a build system”…


And of course the classic "you have re-invented TCP"

...which often happens when somebody creates a UDP-based protocol and then adds their own reliability and robustness on top of it ;)


To be fair to people reinventing TCP, the two transport protocols that will reliably traverse the Internet are often not great fits for most applications. Most people either want to process multiple streams or datagrams independently[0] (which TCP can't do) while having some amount of reliability guarantees[1] (which UDP can't provide).

And if you don't have that problem then you're probably also reinventing HTTP as well as TCP.

[0] without head-of-line blocking

[1] ideally with controls over how reliable we want our messages to be. Real-time usecases like videoconferencing or multiplayer games tend to fail horribly over TCP.


This is why QUIC/http3 is happening right?


Yes.


The last "You have built a browser" lead to Ladybird and is now sponsored. (I've heard or read its creator say that its web engine was initially meant to display help pages in Serenity, or something like this)


And Greenspun's 10th rule I suppose.


For those that needed to Google like me:

"Any sufficiently complicated C or Fortran program contains an ad hoc, informally-specified, bug-ridden, slow implementation of half of Common Lisp."


In the age of horizontal scalability, what you've built is likely more like a nightmare version of Erlang, rather than Common Lisp.


You can see that with so called "Big Data" tools. Those which originated as databases (Mongo), ended up adding "Map-reduce" feature. Those which started as map-reduce tools evolved to support SQL (Hadoop->Spark). Those which started as SQL engines (Spark) added support for streaming, while those who started as steaming platforms (Kafka) added SQL support (KSQLDB). Traditional DB engines evolve to allow document data (Postgres with JSON column type). One more decade until one-tool-to-rule-them-all emerges :)


It's the Cassandra curse of computer scientists. Business people think they don't have time or resources for correctness, and end up forcing incorrect systems to be held by hand. Which, despite the high cost of this labor, is a decent trade-off because thanks to the internet business models have very low marginal cost per costumer. I think that the fact that Netflix has a very high marginal cost per costumer and actually hires very highly qualified programmers and pioneers systems designs illustrates my point.


Would you consider orchestrator and scheduler as two separate things or variations on a common theme?


To me, an orchestrator would certainly have a scheduler in it, but also handle higher layer things like data flow DAGs, event triggers, retries, temporary storage, caching, error propagation, etc.



lol I have accidentally written databases for embedded systems many times. Usually the team lead or architect of the project doesn't see the need for it until we already have too much data and we're close to releasing the product, at which point it's impossible to retrofit.


I work on a Python project where I need to take class definitions and generate database query statements because all ORMs that currently exist don’t work for my needs. I'm currently doing this with string templates that I've defined by hand. Is there a smarter way?

I've looked into some compiler-like tools (can't remember the specific ones, sorry), and from what I can tell their code generation phase looks very similar to mine in that they use string templates.


https://github.com/zio/zio-quill

This library does exactly what you prescribe. Pretty sure under the hood it's using macros with string templates


Scala macros and quasiquote templates do have some notable differences to pure strings. The two main ones are:

- the value that's constructed has to be valid code - macro "hygiene" is maintained


Very cool, thank you. Anything in Python?


I tried searching for something similar in python but didn't find much


Why don’t ORMs work for you? Have you looked at sqlalchemy?


My project uses the Neo4j graph database and the ORMs available here aren't great: they don't handle batched writes so they are super slow, or they do weird hacks like requiring that you run a webserver to let them work, or they don't use managed transaction functions so that write operations aren't automatically retried for you.


Ah yes. Fair enough. I worked on a project that used neo4j once and it was early enough when I joined that I was able to convince them to switch to Postgres before it was too late. It was the missing tooling more than the db itself (though, for 99% of what we were doing we didn’t gain anything from the graph model).


... how does this even happen? What bizarre use case doesn't allow for just using an off the shelf scripting language?

I've only done anything like this once and not regretted it, and it's purely visual scripting, if this then that style. Anything that can't be handled by an event that triggers a list of actions, then stops when one returns False, I will hardcore a hack just for that feature.

Most actions and triggers are responses to specific use cases.


It just crept up on me.


A compiler typically transforms programs that run in O(N) time into programs that run in O(N) time, for whatever suitable definition of N, so it's not really something to be super thrilled about.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: