Hacker Newsnew | past | comments | ask | show | jobs | submit | jeffreygoesto's commentslogin

For profiling I like the dual representation of treemap and tree of https://kcachegrind.github.io/html/Home.html a lot. Addresses the criticized points of treemaps of the post (see percentage and estimate areas of sub-trees) better than the examples chosen there.

Using this as well in embddded. The whole point is to commit and lock the pages after allocation, to not experience what you correctly describe. You want to have a single checkpoint after which you simply can stop worrying about oom.

The iPhone uses Bosch sensors, you might want to check https://www.bosch-sensortec.com/products/motion-sensors/imus...

The most important thing IMHO is to "rotate the structure 90 degrees" from package to function. You start with "all files of one package" and get "all files of all packages, that serve the same purpose" like for example all includes, all man pages, all binaries. This simplifies and speeds up a system, as you only need one or a low number of paths per function.

That is one problem of many solved, isn't that good?

That the spec solves the problem is called validation in my domain and treated explicitly with different methods.

We use formal validation to check for invariants, but also "it must return a value xor an error, but never just hang".


Car mechanics face the same problem today with rare issues. They know the mechanical standard procedures and that they can not track down a problem but only try to flash over an ECU or try swapping it. They also don't admit they are wrong, at least most of the time...


> only try to flash over an ECU or try swapping it.

To be fair, they have wrenches thrown in their way there as many ECUs and other computer-driven components are fairly locked down and undocumented. Especially as the programming software itself is not often freely distributed (only for approved shops/dealers).


A NE555 could be sufficient and even more reductionist. =;-) Circuit example https://theorycircuit.com/ic-555-ic-741/adjustable-duty-cycl...


A CD40106 could do the same and would even be more reductionist (and uses less current).


How much current would the capacitor/resistor attached to it use?

Modern microcontrollers are insanely power efficient. An ESP32 in "light sleep" (which would be sufficient to serve timer routines) is said to consume <1 mA (at 3.3V), down to ~10 uA (microamps!) in "deep sleep".

In other words, 1 year in deep sleep is 315 ampere-seconds or less than 100 mAh.

Obviously it's irrelevant in this use case (where the goal is running a motor every wake-up cycle), but nowadays, as absurd as it looks, being power-constrained isn't necessarily a reason to not slap something on it that happens to also be able to do cryptography, connect to WiFi and make HTTPS requests.


Touché ;)


This does the typical "two short buzzes with a break in between". I think that would be hard with a singe NE555, and of course much more annoying/complicated to fine-tune.

Also, the random delay between the notifications.


Ah, right. With two buzzes it gets complicated again.


There is no shortcut through the hills if complexity. From [0]: "I wouldn’t give a fig for simplicity on this side of complexity; I would give my right arm for the simplicity on the far side of complexity"...

If you move on to the next complex thing, you miss out on the most valuable learning, namely what the essence of the thing you semi-accidentally built really is and what really is worth carrying on into the future...

[0] https://pmhut.com/project-management-on-the-far-side-of-comp...


For me, "Large Steps in Cloth Simulation" [0] made implicit methods accessible... Seminal paper.

[0] https://dl.acm.org/doi/10.1145/280814.280821


For inextensible cloth there's also "Efficient simulation of inextensible cloth" [0] that is particularly clever and efficient

[0] https://dl.acm.org/doi/10.1145/1276377.1276438


27us roundtrip is not really state of the art for zero copy IPC, about 1us would be. What is causing this overhead?


Asking for those who, like me, haven't yet taken the time to find technical information on that webpage:

What exactly does that roundtrip latency number measure (especially your 1us)? Does zero copy imply mapping pages between processes? Is there an async kernel component involved (like I would infer from "io_uring") or just two user space processes mapping pages?


27us and 1us are both an eternity and definitely not SOTA for IPC. The fastest possible way to do IPC is with a shared memory resident SPSC queue.

The actual (one-way cross-core) latency on modern CPUs varies by quite a lot [0], but a good rule of thumb is 100ns + 0.1ns per byte.

This measures the time for core A to write one or more cache lines to a shared memory region, and core B to read them. The latency is determined by the time it takes for the cache coherence protocol to transfer the cache lines between cores, which shows up as a number of L3 cache misses.

Interestingly, at the hardware level, in-process vs inter-process is irrelevant. What matters is the physical location of the cores which are communicating. This repo has some great visualizations and latency numbers for many different CPUs, as well as a benchmark you can run yourself:

[0] https://github.com/nviennot/core-to-core-latency


I was really asking what "IPC" means in this context. If you can just share a mapping, yes it's going to be quite fast. If you need to wait for approval to come back, it's going to take more time. If you can't share a memory segment, even more time.


No idea what this vibe code is doing, but two processes on the same machine can always share a mapping, though maybe your PL of choice is incapable. There aren’t many libraries that make it easy either. If it’s not two processes on the same machine I wouldn’t really call it IPC.

Of course a round trip will take more time, but it’s not meaningfully different from two one-way transfers. You can just multiply the numbers I said by two. Generally it’s better to organize a system as a pipeline if you can though, rather than ping ponging cache lines back and forth doing a bunch of RPC.


It may or may not be good, depending on a number of fact.

I did read the original linux zerocopy papers from google for example, and at the time (when using tcp) the juice was worth the squeeze when payload was larger than than 10 kilobytes (or 20? Don’t remember right now and i’m on mobile).

Also a common technique is batching, so you amortise the round-trip time (this used to be the cost of sendmmsg/recvmmsg) over, say, 10 payloads.

So yeah that number alone can mean a lot or it can mean very little.

In my experience people that are doing low latency stuff already built their own thing around msg_zerocopy, io_uring and stuff :)


io_uring is a tool for maximizing throughput not minimizing latency. So the correct measure is transactions per millisecond not milliseconds per transaction.

Little’s Law applies when the task monopolizes the time of the worker. When it is alternating between IO and compute, it can be off by a factor of two or more. And when it’s only considering IO, things get more muddled still.


> io_uring is a tool for maximizing throughput not minimizing latency.

some features are explicitly designed to minimize latency. I'm thinking of the IORING_SETUP_IOPOLL and IORING_SETUP_SQPOLL flags for io_uring_setup .

I'm not making that up, the manpage says that: https://manpages.debian.org/unstable/liburing-dev/io_uring_s...


It's not a local IPC exactly. The roundtrip benchmark stat is for a TCP server-client ping/pong call using a 2 KB payload; TCP is although on local loopback (127.0.0.1).

Source: https://github.com/mvp-express/myra-transport/blob/main/benc...


indeed, you can get a packet from one box to another in 1-2us


with io_uring? How? I tried everything in the book


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: