I don't think anyone ever regretted doing a test but runtime assertions are so much better because they deal with issues when they happen, rather than trying to predict potential failures. This was probably forgotten as interpreted languages became more popular, but hopefully we're going to see a return to less "developer focused" ways of dealing with errors, so that your pacemaker doesn't stop because someone forgot to catch the exception they didn't think could happen with their 100% coverage.
Well yes, considering NASA wanted it in theirs. There are a lot of advantages in immediately failing when your software does something you don't want it to do rather than trying to continue in a corrupted or invalid state. As for how you deal with it, it depends. Erlang has a lot of "self-healing" supervisor/actor models which are widely used, but I don't have a lot of experience with those.
We use a form of recovery oriented computing in our solar plant components, though to be fair, we only use a very tiny part of it since we basically reboot to the last known valid state. Which isn't exactly 100% safety, but it's not like anyone dies if an inverter or datalogger goes offline for an hour.
I'm curious to know how you would deal with errors in these sort of things without runtime assertions or similar. Maybe my knolwedge is outdated?
I haven't done firmware in more than a decade but back when I did asserting was a no-no. You can't crash. Let's say I have a machine and I'm in the process of moving some actuators crashing might mean I destroyed the machine. That can't be better then letting things that can keep going going even if there's some unexpected condition. Physical safety is typically there by design and doesn't rely on software (there are some exceptions/processes for safety critical software).
So what to do:
- Extensive use of things like state machines where you can convince yourself that you are handling all possible inputs at every possible state.
- Clean/clear/simple code.
- Code reviews.
- Test the heck of your software before you ship it. Automated testing and manual testing.
Never fails isn't possible. Mechanical systems also fail. Electronics also fail. Software can be pretty reliable and either way you can't really tell what to do if there's an unexpected condition. Expected errors should be handled (let's say you try to move an actuator and it doesn't move, or you have a position error on your servo). Firmware can be designed to not run out of memory (so that failure mode can be eliminated by design).
What's to guarantee you won't be just rebooting in a loop if you reboot? That said, I guess if you have analyzed things well enough and that's your preferred recovery then fair enough. But how much state does software for an inverter have that can't just be handled by code?
That's the exact point of runtime assertions. You can't crash, so you fail exactly at the moment something is corrupted. One of the reasons Go didn't devided to include runtime assertions (and one of the only choises for Go I really dislike) is that they aren't exactly safe because you still have to deal with that failure, and I suppose it's very easy to fuck it up.
What you're describing in your post is essentially what I think of when I say runtime assertion. You use them to revert to the previous valid state and retry, you can micro-reboot, you can go into a "factory setting" where your software can continue until an engineer can actually work on it. Things like that. The primary difference is that with runtime assertions, you stop exactly when the corrupted states occours instead of trying to continue with that corrupted state.
It's not like this should replace testing or any of the other things you bring up. I still strongly recommend testing. Tests are for prevention of errors, however, and runtime assertions are for dealing with errors when they happen at runtime. Exception handling is the other way around it, but with exceptions you continue with the corrupted state and try to deal with it down the line. I don't personally like that.
> Never fails isn't possible.
In the previous decade we've had 0 software failures causing shutdown of equipment in any of our many solar parks. This is not to say we haven't had failures. Last I checked the data we've had around 700. incidents which required human intervention, but in every case, the software was capable of running at "factory settings" until the component could safely be repaired or replaced. By contrast we've had quite a lot of hardware failures. Now... it's not exactly life threatening if parts of a solar plant fails, at least not on most plants. In almost any case it'll only cost money, and the reason we're so tight on not failing is actually exactly that. The contracts around downtime responsibility are extremely rigid and my organisation cares quite a lot about placing that responsibility outside of "us". So somewhat ironically we're doing software "right" in this one place because of money, and not because it's the right thing to do. But hey, the work is fun.
> Clean/clear/simple code.
I would replace "Clean" in this part, but it depends on what you mean. YAGNI > Uncle Bob!
> This was probably forgotten as interpreted languages became more popular
I think the notion that all interpreted languages instantly became test-driven in response to lacking good type systems is overblown. In practice both tests and runtime assertions are performed in e.g. Python. Usually more of the latter and less of the former in my experience.
I'm not sure they became instantly test-driven, but I do think that the exception handling model as default error handling in Java and C# was a big drive toward test-driven development. Eventhough both have the ability to do runtime assertions and even work with contracts to some degree. It's also why I think Go's panic is far preferable, but that's a different story, and Go doesn't have runtime assertions so there is that.
But it was probably more than just type systems and error handling. What it has meant, at least in my part of the world, is that a lot of programmers (and I mean the vast majority) are very bad at deal with errors exactly when they happen. This is anecdotal, but I think I've met one or two developers outside the telecom and energy sectors that knew you could do runtime assertions.