More

jeffreygoesto · 2025-12-29T20:25:00 1767039900

For profiling I like the dual representation of treemap and tree of https://kcachegrind.github.io/html/Home.html a lot. Addresses the criticized points of treemaps of the post (see percentage and estimate areas of sub-trees) better than the examples chosen there.

jeffreygoesto · 2025-12-29T18:18:41 1767032321

Using this as well in embddded. The whole point is to commit and lock the pages after allocation, to not experience what you correctly describe. You want to have a single checkpoint after which you simply can stop worrying about oom.

jeffreygoesto · 2025-12-26T19:52:29 1766778749

The iPhone uses Bosch sensors, you might want to check https://www.bosch-sensortec.com/products/motion-sensors/imus...

jeffreygoesto · 2025-12-25T21:00:50 1766696450

The most important thing IMHO is to "rotate the structure 90 degrees" from package to function. You start with "all files of one package" and get "all files of all packages, that serve the same purpose" like for example all includes, all man pages, all binaries. This simplifies and speeds up a system, as you only need one or a low number of paths per function.

jeffreygoesto · 2025-12-17T07:07:39 1765955259

That is one problem of many solved, isn't that good?

That the spec solves the problem is called validation in my domain and treated explicitly with different methods.

We use formal validation to check for invariants, but also "it must return a value xor an error, but never just hang".

jeffreygoesto · 2025-12-14T16:00:24 1765728024

Car mechanics face the same problem today with rare issues. They know the mechanical standard procedures and that they can not track down a problem but only try to flash over an ECU or try swapping it. They also don't admit they are wrong, at least most of the time...

c0balt · 2025-12-14T21:53:26 1765749206

> only try to flash over an ECU or try swapping it.

To be fair, they have wrenches thrown in their way there as many ECUs and other computer-driven components are fairly locked down and undocumented. Especially as the programming software itself is not often freely distributed (only for approved shops/dealers).

jeffreygoesto · 2025-12-14T08:53:53 1765702433

A NE555 could be sufficient and even more reductionist. =;-) Circuit example https://theorycircuit.com/ic-555-ic-741/adjustable-duty-cycl...

atoav · 2025-12-14T10:17:56 1765707476

A CD40106 could do the same and would even be more reductionist (and uses less current).

tgsovlerkhgsel · 2025-12-15T05:20:38 1765776038

How much current would the capacitor/resistor attached to it use?

Modern microcontrollers are insanely power efficient. An ESP32 in "light sleep" (which would be sufficient to serve timer routines) is said to consume <1 mA (at 3.3V), down to ~10 uA (microamps!) in "deep sleep".

In other words, 1 year in deep sleep is 315 ampere-seconds or less than 100 mAh.

Obviously it's irrelevant in this use case (where the goal is running a motor every wake-up cycle), but nowadays, as absurd as it looks, being power-constrained isn't necessarily a reason to not slap something on it that happens to also be able to do cryptography, connect to WiFi and make HTTPS requests.

jeffreygoesto · 2025-12-14T16:01:36 1765728096

Touché ;)

tgsovlerkhgsel · 2025-12-14T14:48:51 1765723731

This does the typical "two short buzzes with a break in between". I think that would be hard with a singe NE555, and of course much more annoying/complicated to fine-tune.

Also, the random delay between the notifications.

jeffreygoesto · 2025-12-14T16:02:26 1765728146

Ah, right. With two buzzes it gets complicated again.

jeffreygoesto · 2025-12-13T10:07:05 1765620425

There is no shortcut through the hills if complexity. From [0]: "I wouldn’t give a fig for simplicity on this side of complexity; I would give my right arm for the simplicity on the far side of complexity"...

If you move on to the next complex thing, you miss out on the most valuable learning, namely what the essence of the thing you semi-accidentally built really is and what really is worth carrying on into the future...

[0] https://pmhut.com/project-management-on-the-far-side-of-comp...

jeffreygoesto · 2025-12-10T10:31:28 1765362688

For me, "Large Steps in Cloth Simulation" [0] made implicit methods accessible... Seminal paper.

[0] https://dl.acm.org/doi/10.1145/280814.280821

chombier · 2025-12-10T16:37:36 1765384656

For inextensible cloth there's also "Efficient simulation of inextensible cloth" [0] that is particularly clever and efficient

[0] https://dl.acm.org/doi/10.1145/1276377.1276438

jeffreygoesto · 2025-12-07T17:24:12 1765128252

27us roundtrip is not really state of the art for zero copy IPC, about 1us would be. What is causing this overhead?

jstimpfle · 2025-12-13T14:04:19 1765634659

Asking for those who, like me, haven't yet taken the time to find technical information on that webpage:

What exactly does that roundtrip latency number measure (especially your 1us)? Does zero copy imply mapping pages between processes? Is there an async kernel component involved (like I would infer from "io_uring") or just two user space processes mapping pages?

foltik · 2025-12-13T18:04:04 1765649044

27us and 1us are both an eternity and definitely not SOTA for IPC. The fastest possible way to do IPC is with a shared memory resident SPSC queue.

The actual (one-way cross-core) latency on modern CPUs varies by quite a lot [0], but a good rule of thumb is 100ns + 0.1ns per byte.

This measures the time for core A to write one or more cache lines to a shared memory region, and core B to read them. The latency is determined by the time it takes for the cache coherence protocol to transfer the cache lines between cores, which shows up as a number of L3 cache misses.

Interestingly, at the hardware level, in-process vs inter-process is irrelevant. What matters is the physical location of the cores which are communicating. This repo has some great visualizations and latency numbers for many different CPUs, as well as a benchmark you can run yourself:

[0] https://github.com/nviennot/core-to-core-latency

jstimpfle · 2025-12-13T20:25:45 1765657545

I was really asking what "IPC" means in this context. If you can just share a mapping, yes it's going to be quite fast. If you need to wait for approval to come back, it's going to take more time. If you can't share a memory segment, even more time.

foltik · 2025-12-13T21:22:40 1765660960

No idea what this vibe code is doing, but two processes on the same machine can always share a mapping, though maybe your PL of choice is incapable. There aren’t many libraries that make it easy either. If it’s not two processes on the same machine I wouldn’t really call it IPC.

Of course a round trip will take more time, but it’s not meaningfully different from two one-way transfers. You can just multiply the numbers I said by two. Generally it’s better to organize a system as a pipeline if you can though, rather than ping ponging cache lines back and forth doing a bunch of RPC.

znpy · 2025-12-13T14:10:16 1765635016

It may or may not be good, depending on a number of fact.

I did read the original linux zerocopy papers from google for example, and at the time (when using tcp) the juice was worth the squeeze when payload was larger than than 10 kilobytes (or 20? Don’t remember right now and i’m on mobile).

Also a common technique is batching, so you amortise the round-trip time (this used to be the cost of sendmmsg/recvmmsg) over, say, 10 payloads.

So yeah that number alone can mean a lot or it can mean very little.

In my experience people that are doing low latency stuff already built their own thing around msg_zerocopy, io_uring and stuff :)

hinkley · 2025-12-13T18:16:54 1765649814

io_uring is a tool for maximizing throughput not minimizing latency. So the correct measure is transactions per millisecond not milliseconds per transaction.

Little’s Law applies when the task monopolizes the time of the worker. When it is alternating between IO and compute, it can be off by a factor of two or more. And when it’s only considering IO, things get more muddled still.

znpy · 2025-12-17T23:55:12 1766015712

> io_uring is a tool for maximizing throughput not minimizing latency.

some features are explicitly designed to minimize latency. I'm thinking of the IORING_SETUP_IOPOLL and IORING_SETUP_SQPOLL flags for io_uring_setup .

I'm not making that up, the manpage says that: https://manpages.debian.org/unstable/liburing-dev/io_uring_s...

rohanray · 2025-12-08T21:12:24 1765228344

It's not a local IPC exactly. The roundtrip benchmark stat is for a TCP server-client ping/pong call using a 2 KB payload; TCP is although on local loopback (127.0.0.1).

Source: https://github.com/mvp-express/myra-transport/blob/main/benc...

blibble · 2025-12-13T14:57:23 1765637843

indeed, you can get a packet from one box to another in 1-2us

steeve · 2025-12-13T17:41:51 1765647711

with io_uring? How? I tried everything in the book