So there is access to "degraded functionality" from start (the "3-15" of "degrad...

swiftcoder · 2025-11-04T14:50:43 1762267843

> why not share THAT then?

When you've guaranteed 4 or 5 nines worth of uptime to the customer, every acknowledged outage results in refunds (and potentially being sued over breach of contract)

xigoi · 2025-11-04T17:10:33 1762276233

On the other hand, if they’re down but don’t report it, couldn’t they be sued for fraud?

chrismorgan · 2025-11-05T05:30:39 1762320639

Meh, I’ve never seen an uptime (SLA) guarantee that was worth anything anyway. They’re consistently toothless, publicly-offered ones anyway (can’t comment on privately-negotiated ones). I’ve written about it a few times, with a couple of specific examples: https://hn.algolia.com/?type=comment&query=sla+chrismorgan.

But not acknowledging actual outages, yeah, that would open you up to accusations of fraud, which is probably in theory much more serious.

jakevoytko · 2025-11-04T15:02:43 1762268563

Because the systems are so complex and capable of emergent behavior that you need a human in the loop to truly interpret behavior and impact. Just because an alert is going off doesn't mean that the alert was written properly, or is measuring the correct thing, or the customer is interpreting its meaning correctly, etc.

mirekrusin · 2025-11-05T16:32:27 1762360347

Health probes are at the easiest side of software complexity spectrum. It has nothing to do with it and everything with managing reputational damage in shady way.