Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Apart from Cloudflare config system working too good to propagate failure modes:

the code quality on very mission critical path powering “half the internets” could’ve been better.

I’m not sure if Lua LSP / linting tools would’ve caught the issue (I also never used Lua myself), but tools and methods exist to test mission critical dynamically typed code.

The company with genuinely impressive concentration of talent was expected to think about fuzzing this legacy crap somehow.

As for `.unwrap()` related incident: normally code like this should never pass the review.

You just (almost) never unwrap in production code.

I’d start with code quality tooling but more important - the related processes before even thinking about the architecture changes.

Changing architecture in global sense which, in general has served for years with 99.99(9)% uptime is not obviously smart thing to do.

Architecture is doing great, it’s just impact which has been devastating because of the scale.

Everyone makes errors and it’s fine, but there are ways not to roll shit in prod (reference to famous meme pic where bugs do that).





Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: