Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I tell you that the reporting of correctable errors is not always honest. On some systems you can see uncorrectable errors before you see the correctable errors which statistically can't be. That it should be honest beside the point.


There is a question of agency here.

You are absolutely correct that reporting of errors may not be "honest", I prefer to use the work "correct" rather than honest because honesty implies and intention to lie and the computer logic is generally designed to be correct.

The market reality is that reporting correctable errors can generate unnecessary service tickets as people who are not technically sophisticated. Those users may generate a support ticket wondering why their system is telling them it saw and error but then it corrected it. Dealing with support tickets costs money, money that is taken away from margin, and so it is sometimes "optimized out" which is that some manager makes their numbers better by ordering the software team to "hide" correctable errors which were corrected.

It is absolutely fair to describe that management choice as "dishonest."

It can be difficult sometimes to find out where the dishonesty was applied however. It has been my experience that between the BIOS and the kernel, if the chipset supports ECC and the BIOS recognizes that you have ECC memory installed, there is a means for extracting all ECC events reliably. It isn't always well documented and sometimes requires several levels of escalation in support to get the information you need, but when ECC is important to you it can be worth it. It can also inform your choices for vendors to use in the future. :-)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: