A random thing I ran into with the defaults (Ubuntu Linux):
- net.ipv4.tcp_rmem ~ 6MB
- net.core.rmem_max ~ 1MB
So.. the tcp_rmem value overrides by default, meaning that the TCP receive window for a vanilla TCP socket actually goes up to 6MB if needed (in reality - 3MB because of the halving, but let's ignore that for now since it's a constant).
But if I "setsockopt SO_RCVBUF" in a user-space application, I'm actually capped at a maximum 1MB, even though I already have 6MB. If I try to reduce it from 6MB to e.g. 4MB, it will result in 1MB. This seems very strange. (Perhaps I'm holding it wrong?)
(Same applies to SO_SNDBUF/wmem...)
To me, it seems like Linux is confused about the precedence order of these options. Why not have core.rmem_max be larger and the authoritative directive? Is there some historical reason for this?
If you want to limit the amount of excess buffered data you can lower TCP_NOTSENT_LOWAT instead, which caps the amount that is buffered beyond what's needed for the BDP.
1. While your context about auto-tuning is accurate and valuable, it doesn't really address the fundamental strangeness that the parent post is commenting about: It's still strange that it can auto-tune to a higher value than you can manually tune it to.
2. It's always valuable to provide further references, but I'd guess that down-voters found the "It's pretty clearly documented" phrasing a little condescending? Perhaps "See the docs at [] for more information."?
3. "Please don't comment about the voting on comments. It never does any good, and it makes boring reading."
Their criticism was accurate and well intentioned. Getting downvoted not for the content but perhaps poor phrasing is perfectly normal. Complaining at all about the votes your internet comment gets is asinine.
It's not asinine to complain that for no good reason a perfectly good technical reference written for the benefit of all readers was being grayed out (at the time) via downvotes. It took a non-zero amount of work to dig up where the setting is documented, and I didn't do it for my own benefit.
This isn't taking it personally like I value HN karma. This is complaining purely because downvotes can make content invisible.
Your original comment amounted to "It's working as documented, see here and here". But arguably the question was "Why does it work in this baffling way?"
Certainly that's how I interpreted it -- and while I didn't downvote your answer explaining that this weird footgun is actually documented behaviour, I got no value from either that information or your tone, which read to me as a little dismissive ("It's pretty clearly documented [, you lazy/incompetent person who didn't bother to look this up yourself]").
> Getting downvoted not for the content but perhaps poor phrasing is perfectly normal
IMHO, good and relevant content beats poor phrasing (which again IMHO I didn't witness in the original post), especially since English is not the first language for many people on this board. Downvoting only disincentivizes posting and unfortunately the HN voting system doesn't indicate why, leaving one just to guess.
> once you do SO_RCVBUF the auto-tuning is out of the picture for that socket
Oh I didn’t realize this. That explains the switch in limits. However:
I would have liked to keep auto-tuning, but only change the max buffer size. It’s still weird to me that these are different modes with different limits and whatnot. In my case, I was parallelizing tcp and capping the max size would have been better, and instead varying the number of conns.
I gave up on it. Especially since I need cross platform user-space only, I don’t want to fiddle with these APIs that are all different and unpredictable. I guess it’s for the best anyway, to avoid as much per-platform hacks as possible.
- net.ipv4.tcp_rmem ~ 6MB
- net.core.rmem_max ~ 1MB
So.. the tcp_rmem value overrides by default, meaning that the TCP receive window for a vanilla TCP socket actually goes up to 6MB if needed (in reality - 3MB because of the halving, but let's ignore that for now since it's a constant).
But if I "setsockopt SO_RCVBUF" in a user-space application, I'm actually capped at a maximum 1MB, even though I already have 6MB. If I try to reduce it from 6MB to e.g. 4MB, it will result in 1MB. This seems very strange. (Perhaps I'm holding it wrong?)
(Same applies to SO_SNDBUF/wmem...)
To me, it seems like Linux is confused about the precedence order of these options. Why not have core.rmem_max be larger and the authoritative directive? Is there some historical reason for this?