More

wozzio · 2025-12-13T21:43:58 1765662238

Ideally we would measure Request minus Max_Usage_Over_7_Days.

Since this is a snapshot tool (wrapping kubectl top) it can't see historical peaks so it def leans towards being a current state audit. Love the idea of pairing this with a load test benchmark that would actually give you a calculated Safe Request Size rather than just a guess.

wozzio · 2025-12-13T21:41:21 1765662081

I’ll update the README instruction to include curl -L immediately thanks for flagging

wozzio · 2025-12-13T21:26:11 1765661171

In the clusters I've audited so far typically 30% to 50% of the total requested capacity is allocatable waste resources that are reserved/billed but never actually touched.

wozzio · 2025-12-13T21:25:03 1765661103

That aggressive release behavior is exactly what we need more of most runtimes (looking at you legacy Java) just hoard the heap forever because they assume they are the only tenant on the server.

BatteryMountain · 2025-12-14T07:31:35 1765697495

In C#'s dependency injector you basically have to choose from 3 lifetimes for a class/service: scoped, transient or singletons.

Scoped & transient lifetimes along with the new GC will make the runtime much leaner.

Some application are singleton heavy or misuse the MemoryCache (or wrap it as a singleton... facepalm) - these will still mess the GC situation up.

If you build web/api project it pays dividends to educate yourself on the 3 life times, how the GC works, how async/await/cancellation tokens/disposables work, how MemoryCache work (and when to go out-of-process / other machine aka Redis), how the built-in mechanisms in aspnet works to cache html outputs etc. A lot of developers just wing it and then wonder why they have memory issues.

And for for the dinosaurs: yes we can use the dependency injector in Windows Form, WPF, Console Apps and so on - those packages aren't limited to web projects alone.

wozzio · 2025-12-13T21:20:57 1765660857

That new .NET behavior is the goal smart runtimes that yield memory back to the OS so we don't have to play guess the request in YAML.

Unfortunately most legacy Java/Python workloads I see in the wild are doing the exact opposite: hoarding RAM just in case. Until the runtimes get smarter, we're stuck fixing the configs.

wozzio · 2025-12-13T21:19:27 1765660767

It feels unsolvable because we've been trying to solve it with static spreadsheets and guesses. It's actually very solvable if you treat it as a control loop problem continuous adjustment rather than a set it and forget it config.

wozzio · 2025-12-13T21:18:02 1765660682

If you're small startup burning $5k/year in lazy tax is smarter than distracting your engineers the math flips when you hit scale.

For some that safety margin isn't small. At a certain volume, waste exceeds the cost of a full-time engineer so optimizing becomes profit.

wozzio · 2025-12-13T21:10:27 1765660227

That doc is gold thanks for linking.

Yeah defaults are the enemy here. Most of the waste I'm seeing in the data comes from generic Spring Boot apps running with out of the box settings where the JVM assumes it owns the entire node.

wozzio · 2025-12-13T21:02:50 1765659770

you are technically right that requests are scheduling hints but in a cluster autoscaler world, requests=bill.

If I request 8GB for a pod that uses 1GB, the autoscaler spins up nodes to accommodate that 8GB reservation. That 7GB gap is capacity the company is paying for but cannot use for other workloads.

Valid point on Goodhart's Law, tho the goal shouldn't be fill the RAM, but rather lower the request to match the working set so we can bin-pack tighter.

nyrikki · 2025-12-13T22:44:40 1765665880

This script does nothing to solve that and actually exasperates problem of people over specing.

It makes assumptions about pricing[0], when if you do need a peak of 8GB it would force you into launching and consuming that 8GB immediately, because it is just reading a current snapshot from /proc/$pid/status:VmSize [1] and says you are waisting memory if "request - actual usage (MiB)" [2]

What if once a week you need to reconcile and need that 8GB, what if you only need 8GB once every 10 seconds? There script won't see that; so to be defensive you can't release that memory, or you will be 'wasting' resource despite that peek need.

What if your program only uses 1GB, but you are working on a lot of parquet files, and with less ram you start to hit EBS IOPS limits or don't finish the nightly DW run because you have to hit disk vs working from the buffer with headroom etc..

This is how bad metrics wreck corporate cultures, the ones in this case encourage overspending. If I use all that ram I will never hit the "top_offender" list[3] even if I cause 100 extra nodes to be launched.

Without context, and far more complicated analytics "request - actual usage (MiB)" is meaningless, and trivial to game.

What incentive except making sure that your pods request ~= RES 24x7x356 ~= OOM_KILL limits/2, to avoid being in the "top_offender" does this metric accomplish?

Once your skip's-skip's-skip sees some consultant labeled you as a "top_offender" despite your transient memory needs etc... how do you work that through? How do you "prove" that against a team gaming the metric?

Also as a developer you don't have control over the clusters placement decisions, nor typically directly choosing the machine types. So blaming the platform user on the platform teams' inappropriate choice of instance types, while shutting down many chances of collaboration by blaming the platform user typically also isn't a very productive path.

Minimizing cloud spend is a very challenging problem, which typically depends on collaboration more than anything else.

The point is that these scripts are not providing a valid metric, and absolutely presenting that metric in a hostile way. It could be changed to help a discovery process, but absolutely will not in the current form.

[0] https://github.com/WozzHQ/wozz/blob/main/scripts/wozz-audit.... [1] https://github.com/google/cadvisor/blob/master/cmd/internal/... [2] https://github.com/WozzHQ/wozz/blob/main/scripts/wozz-audit.... [3] https://github.com/WozzHQ/wozz/blob/main/scripts/wozz-audit....

wozzio · 2025-12-13T23:00:50 1765666850

Really fair critique regarding the snapshot approach. You're right optimizing limits based on a single point in time is dangerous for bursty workloads the need 8GB for 10 seconds scenario.

The intent of this script isn't to replace long-term metric analysis like Prometheus/Datadog trends, but to act as a smoke test for gross over-provisioning, the dev who requested 16GB for a sidecar that has flatlined at 100MB for weeks.

You make a great point about the hostile framing of the word waste. I definitely don't want to encourage OOM risks. I'll update the readme to clarify that this delta represents potential capacity to investigate rather than guaranteed waste.

Appreciate the detailed breakdown on the safety buffer nuances.

wozzio · 2025-12-13T20:57:56 1765659476

The curl | bash is just for convenience; the README explicitly advises to Download and inspect wozz.sh first if you aren't comfortable piping to shell.

As for the newness I just open-sourced this from my personal scripts collection this week, so yes, the Org and Account are new. It runs entirely locally using your active kubeconfig it doesn't collect credentials or send secrets anywhere. You can cat the script to verify that it's just a wrapper around kubectl top and kubectl get.