I have 0 results for PDB SRE on multiple search engines. What does it mean?

mardifoufs · on March 28, 2023

I think it's Pod Disruption Budgets, a kubernetes redundancy/resiliency related concept.

nailer · on March 28, 2023

> k8s permits me to evict workloads while obeying PDB — in previous orgs, "PDBs" (hell, we didn't even have a word to describe the concept)

Odd, the parent makes it seem like resource budgets weren’t a thing before k8s.

deathanatos · on March 28, 2023

I've never heard the term "resource budget" used to describe this concept before. Got a link?

That'd be an odd set of words to describe it. To be clear, I'm not talking about budgeting RAM or CPU, or trying to determine do I have enough of those things. A PodDisruptionBudget describes the manner in which one is permitted to disrupt a workload: i.e., how can I take things offline?

Your bog simple HTTP REST API service, for example, might have 3 replicas, behind like a load balancer. As long as any one of those replicas is up, it will continue to serve. That's a "PodDisruptionBudget", here, "at least 1 must be available". (minAvailable: 1, in k8s's terms.)

A database that, e.g., might be using Raft, would require a majority to be alive in order to serve. That would be a minAvailable of "51%", roughly.

So, some things I can do with the webservice, I cannot do with the DB. PDBs encode that information, and since it is in actual data form, that then lets other things programmatically obey that. (E.g., I can reboot nodes while ensuring I'm not taking anything offline.)

( https://kubernetes.io/docs/tasks/run-application/configure-p... )

morelisp · on March 28, 2023

A PDB is a good example of Kubernetes's complexity escalation. It's a problem that arises when you have dynamic, controller-driven scheduling. If you don't need that you don't need PDBs. Most situations don't need that. And most interesting cases where you want it, default PDBs don't cover it.

deathanatos · on March 28, 2023

> A PDB is a good example of Kubernetes's complexity escalation. It's a problem that arises when you have dynamic, controller-driven scheduling. If you don't need that you don't need PDBs. Most situations don't need that.

No, and that's my point: PDBs exist always. Whether your org has a term for it, or whether you're aware of them is an entirely different matter.

We I did work comprised of services running on VMs, there is still a (now, spritual) PDB associated with that service. I cannot just take out nodes willy-nilly, or I will be the cause of the next production outage.

In practice, I was just intimately familiar with the entire architecture, out of necessity, and so I knew what actions I could and could not take. But it was not unheard of for a less-cautions or less-skilled individual to do before thinking. And it inhibits automation: automation needed to be aware of the PDB, and honestly we'd probably just hard-code the needs on a per-service basis. PDBs, as k8s structures them, solves the problem far more generically.

morelisp · on March 28, 2023

> we'd probably just hard-code the needs on a per-service basis.

For 99% of situations this is a better decision. For, idk, at least 20% of the remaining 1%, PDBs won't handle it anyway.

nailer · on March 29, 2023

Sounds like a PDB isn’t a resource budget then. We were using that concept in ESX farms 20 years ago but it seems PDBs are more what more SREs would describe as minimum availability.

nailer · on March 29, 2023

Maybe 'minimum instance availability' to be specific that were referring to instances of a service rather than an SLA.