More

andygocke · 2025-10-08T16:44:19 1759941859

Yeah but we have codegen bugs in .NET as well. The biggest difference that stood out to me in this write up, is we would have gone straight for “coredump” instead of other investigation tools. Our default mode of investigating memory corruption issues is dumps.

pjmlp · 2025-10-08T17:56:15 1759946175

Sure, I have experienced them, e.g. once in 2006 using IBM's JVM implementation with Websphere.

However it is probably not as problematic due to the way Go allows for Assembly being used directly.

While the JVM and CLR don't allow for direct access to Assembly code, Go does, thus I assume expecting safepoints everywhere is not an option, as any subroutine call can land on code that was manually written.

yvdriess · 2025-10-09T11:34:20 1760009660

Go users can only insert assembly wrapped in a function call. That might be safety related, I am not entirely sure.

(Well technically there is a way to inject assembly without the function call overhead. That's what https://pkg.go.dev/runtime/internal/atomic is doing. But you will need to modify the runtime and compiler toolchain for it.)

pjmlp · 2025-10-09T17:21:06 1760030466

If you look the docs, they expect the developer to add specific information and use the registers in a specific way, otherwise Go will face runtime issues.

Whereas when you go over CGO, you get a marshaling layer similar to how JNI, P/Invoke work, that take care of those issues.

andygocke · 2025-07-22T18:13:30 1753208010

Unfortunately, there are alternatives to this behavior, but they all have other downsides. The biggest constraint was the schedule didn't support a new version of the .NET IL format (and reving the IL format is an expensive change for compat purposes, as well). There were two strong lowering contenders, with their own problems.

The first is to use a `With` method and rely on "optional" parameters in some sense. When you write `with { x = 3 }` you're basically writing a `.With(x: 3)` call, and `With` presumably calls the constructor with the appropriate values. The problem here is that optional parameters are also kind of fake. The .NET IL format doesn't have a notion of optional parameters -- the C# compiler just fills in the parameters when lowering the call. So that means that adding a new field to a record would require adding a new parameter. But adding a new parameter means that you've broken binary backwards compatibility. One of the goals of records was to make these kinds of "simple" data updates possible, instead of the current situation with classes where they can be very challenging.

The second option is a `With` method for every field. A single `with { }` call turns into N `WithX(3).WithY(5)` for each field being set. The problem with that is that it is a lot of dead assignments that need to be unwound by the JIT. We didn't see that happening reliably, which was pretty concerning because it would also result in a lot of allocation garbage.

So basically, this was a narrow decision that fit into the space we had. If I had the chance, I would completely rework dotnet/C# initialization for a reboot of the language.

One thing I proposed, but was not accepted, was to make records much more simple across the board. By forbidding a lot of the complex constructs, the footguns are also avoided. But that was seen as too limiting. Reading between the lines, I bet Jon wouldn’t have liked this either, as some of the fancy things he’s doing may not have been possible.

louthy · 2025-07-22T19:52:34 1753213954

> The biggest constraint was the schedule didn't support a new version of the .NET IL format (and reving the IL format is an expensive change for compat purposes, as well).

My biggest sadness reading this is that what MS have done is to outsource the issue to all C# devs. We will all hit this problem at some point (I have a couple of times) and I suspect we will all lose hours of time trying to work out WTF is going on. It may not quite be the Billion Dollar Mistake, but it's an ongoing cost to us all.

A possible approach I mentioned elsewhere in the thread is this (for the generation of the `with`):

    var n2 = n1.<Clone>$();
    n2.Value = 3;                  // 'with' field setters
    n2.<OnPostCloneInitialise>();  // run the initialisers

Then the <OnPostCloneInitialise>:

    public virtual void <OnPostCloneInitialise>()
    {
        base.<OnPostCloneInitialise>();

        Even = (Value & 1) == 0;    
    }

If the compiler could generate the <OnPostCloneInitialise> based on the initialisation code in the record/class, could that work?

That would just force the new object to initialise after the cloning without any additional IL or modifications.

andygocke · 2025-07-22T22:30:31 1753223431

> MS have done is to outsource the issue to all C# devs

Let's be clear: breaking dozens of tools because of a change to the IL format also outsources an issue to all C# devs. The .NET IL format has been basically unchanged since .NET 2.0 and huge numbers of people take very hard dependencies on the exact things they do and do not expect. I don't expect we would have been able to make significant changes due to the breaking change impact.

> A possible approach I mentioned

This would likely be even harder to understand. For better or worse, the .NET design is that external initializers happen _after_ the constructor runs. That's been true all the way back to when the initializer syntax was first introduced in C# 3. Making regular initializers and `with` initializers have inverted order strikes me as being way worse.

If I could go back in time, I think the main change to C# I would make would be to enforce that the constructor always runs after all external initialization.

louthy · 2025-07-22T23:43:37 1753227817

Slightly confused. My suggestion was to run the initialisers after the new object has been constructed (cloned+modified). The semantics are the same as you describe even if the underlying implementation is different.

What am I missing?

andygocke · 2025-07-23T03:10:33 1753240233

There are two types of initializers: internal and external. Internal are inside the type, like field and property initializers. External are outside, like object initializers, collection initializers, and ‘with’ clauses.

Internal initializers are run as part of the constructor, before any user code. External initializers are run after the constructor, on the constructed object.

For instance:

  class C
  {
    public int P = 5;
  }

  var c = new C { P = 3 };

`c.P` has the value 3.

In your example:

    var n2 = n1.<Clone>$();
    n2.Value = 3;                    // 'with' field setters
    n2.<OnPostCloneInitialise>();  // run the initialisers

The “PostCloneInitializers” you’re running are the field initializers, so the order is backwards. You’re overwriting the value of the external initializers with the internal initializers.

andygocke · 2025-03-18T04:00:32 1742270432

Hi, I own the Native AOT compiler and self-contained compiler for .NET.

Self-contained will work fine because we precompile the runtime and libraries for all supported platforms.

Native AOT won't, because we rely on the system linker and native libraries. This is the same situation as for C++ and Rust. Unlike Go, which doesn't use anything from the system, we try to support interop with system libraries directly, and in particular rely on the system crypto libraries by default.

Unfortunately, the consequence of relying on system libraries is that you actually have to have a copy of the system libraries to link against them, and a linker that supports that. In practice, clang is actually a fine cross-linker for all these platforms, but acquiring the system libraries is an issue. None of the major OSes provide libraries in a way that would be easy to acquire and deliver to clang, and we don't want to get into the business of building and redistributing the libcs for all platforms (and then be responsible for bugs etc).

Note that if you use cgo and call any C code from Go you will end up in the same situation even for Go -- because then you need a copy of the target system libc and a suitable system linker.

andygocke · 2025-01-25T04:07:29 1737778049

I guess people have different experiences as I don’t see the React changes as improving the mobile experience: just the opposite. I often interact with GitHub by browsing through the file tree and various links. But the new react renderer breaks the back button. So often when I’m browsing and hit back, I leave GitHub entirely instead of going back to the parent directory.

meixger · 2025-01-25T09:05:37 1737795937

Same. The broken back navigation needs more attention. Upvote here https://github.com/orgs/community/discussions/75889

cjk · 2025-01-25T20:18:00 1737836280

Between the back button issues and the horrible sluggishness on many repo and pull request pages, the React switchover has seriously degraded what used to be a great experience. Not just on mobile, but on desktop as well.

andygocke · on Aug 23, 2024

Yes, we considered that and implemented a solution. Effectively, the runtime will generate thunk methods that will invisibly bridge between the two worlds. Calling (and overriding) regular async methods with runtime async methods will be stitched up by the runtime. The user will never see the difference.

pjmlp · on Aug 24, 2024

Cool! Thanks for jumping in.

andygocke · on Aug 23, 2024

Well, any additional FFI overhead, right?

The cost for exposing very little tends to be that marshaling costs more due to the requirement that values be copied between domains rather than shared.

pron · on Aug 23, 2024

That's a matter of perspective.

Calling a C function in a shared library (dll, so) from Java using the new FFM API has the same overhead as calling such a function from C++ (although the overhead is higher if the called function upcalls into Java again, though that is relatively rare, or if the function blocks, only that makes the additional overhead negligible). But the FFM API does not directly expose Java objects to native code at all, although it does allow Java code to access and mutate "off-heap" native memory (C data) from Java code as efficiently as accessing and mutating Java heap memory. So if your goal is to expose Java objects to native code, then yes, that would require marshalling (although ideally you should do the opposite and expose native memory to Java code as trhough a Java interface, which would have no overhead).

However, relying on FFI in Java is far less common than in Python, Rust, or even C# or Go, and in the rare cases it's done it's easy to do it cheaply as I described. So I guess it's true to say that if you wanted FFI to work in the same manner it is employed in those other languages then yes, it would be more expensive as it would require marshalling, but that's just not the case in Java given the combination of Java's performance and size of its ecosystem of libraries.

Languages with worse performance or with smaller ecosystems do need to rely much more heavily on FFI and so they often choose to sacrifice the flexibility of their implementation in favour of a more direct flavour of FFI.

andygocke · on Aug 23, 2024

I agree with your general point, that it depends on your specific problem how difficult this is, but I disagree about how common or easy to work around.

Regarding

> But the FFM API does not directly expose Java objects to native code at all, although it does allow Java code to access and mutate "off-heap" native memory (C data) from Java code as efficiently as accessing and mutating Java heap memory

I just don’t buy it. First, I think it’s very common to want to expose managed memory to native. In fact, it might be the dominant case. If I want to call out to perform a crypto operation on a block of bytes I got from a Java operation, I don’t want to copy them first.

Second, I think you’re missing the use case for manipulating system APIs. If you want to perform some system call and the call requires setting up some structures as arguments, that’s going to be pretty expensive in Java. For things that are called a lot it can add up. For example, windows has a profiling and eventing system called ETW. To use it you create a set of events and call the system. It’s not uncommon to do this for thousands or millions of events per second. The way C# handles this is stack allocating an event blob and calling directly. I can’t imagine a Java workaround that would be as fast or simple. It seems like you’d have to pool a native event blob allocation and fill it in from Java.

It’s true that most Java programmers aren’t blocked by this but I think that’s because many Java programmers don’t try to use Java for these tasks. They don’t write systems software in Java and they don’t embed into big, performance-sensitive native apps, like games.

pron · on Aug 23, 2024

> First, I think it’s very common to want to expose managed memory to native. In fact, it might be the dominant case. If I want to call out to perform a crypto operation on a block of bytes I got from a Java operation, I don’t want to copy them first.

Doing it this way is not so common in Java anyway. First, primitive operations for crypto are intrinsics in Java and operate without FFI at all. Second, IO input and output buffers in high-performance applications are typically in off-heap buffers anyway (i.e. you serialize data to an off-heap buffer and then do crypto and then send it over the wire, or you receive data in an off-heap buffer, do crypto, and then deserialize).

> Second, I think you’re missing the use case for manipulating system APIs. If you want to perform some system call and the call requires setting up some structures as arguments, that’s going to be pretty expensive in Java.

It's not, because FFM allows you to manipulate native structs with no overhead. You do this efficient kind of stack allocation of native structures with FFM's Arenas and SegmentAllocator (https://docs.oracle.com/en/java/javase/22/docs/api/java.base...)

> They don’t write systems software in Java and they don’t embed into big, performance-sensitive native apps, like games.

It's true low-level programs are typically not written in Java, but the applications programming market is bigger. I wouldn't be at all surprised if applications written in Java alone comprise a bigger market than all intrinsically low-level applications combined. As for embedding in another application, there is no intrinsic reason not to do it in Java, but 1. traditionally and for "environmental" reasons Java hasn't been huge in the games space (except for Minecraft, of course) and 2. it's been less than six months since FFM became a permanent feature in the JDK; JNI, the FFI mechanism that preceded FFM was really quite cumbersome to use so it's not surprising people opted for more convenient FFI.

andygocke · on Aug 24, 2024

> First, primitive operations for crypto are intrinsics in Java and operate without FFI at all.

This is a pretty strange assertion given that I didn’t specify the crypto operation I wanted to perform. Is XAES-256-GCM available in the Java standard library?

> Doing it this way is not so common in Java anyway

Sure, because doing it the other way would be very expensive. But that doesn’t mean applications which can’t front or backload native processing don’t exist, it just means they will have slower throughput in Java.

It’s fine for a language to make that tradeoff, but it is a tradeoff

pron · on Aug 24, 2024

> Is XAES-256-GCM available in the Java standard library?

No (is it in any language's standard library?) but everything you need to implement it in Java is available.

> But that doesn’t mean applications which can’t front or backload native processing don’t exist, it just means they will have slower throughput in Java.

They won't, because working with native memory is just as efficient as working with heap memory. You store your bytes in a MemorySegment and you don't care if it's backed by an on- or off-heap buffer. I guess you could say, oh, but when working with FFI in Java you may need to keep some buffers off-heap if you don't want to copy bytes, but that's common practice in Java since JDK 1.4 (2002).

> It’s fine for a language to make that tradeoff, but it is a tradeoff

There is a tradeoff, but it's not on performance. Rather than expose Java heap objects directly to native code (which is possible with the old JNI, but not the recommended approach), Java says keep the bytes that you want to efficiently pass to native code off-heap and makes it easy to do (through the same interface for on- and off-heap data).

Rather than constrain the implementation, which could have performance implications always, Java gives you the choice to have no FFI overhead at the cost of a tiny bit of convenience when doing FFI. Given how rare FFI is in Java compared to many other languages, that is obviously the right design decision and it helps performance rather than harms it. So there is a tradeoff, but you're clearly trading away less than you would have if FFI were more common and the core implementation were impacted by it.

Ultimately, the question of "is it better to sacrifice language performance and flexibility in exchange for doing X (without significant performance overhead) in 3 lines instead of 30" depends entirely on the answer to the question how often users of the language need to do X. If the language is Java and X is FFI, the answer is "rarely" and so you're paying a small cost for a large gain. The tradeoff between the convenience of low/no-overhead FFI and language performance and flexibility becomes much more difficult and impactful in languages where FFI is more common.

andygocke · on Aug 23, 2024

> The Gleam example has all the convenience and readability of its C#/Python counterpart - but without the downsides.

This was mentioned in the write-up, but the big downside is interop. Green threads have significant downside when going across OS threads.

This is the same reason why Rust ended up with async. Async is basically the cost you pay for C interop. However, C# runtime-async will likely be much simpler than Rust async since ownership is GC-managed and doesn't need to be transferred across threads.

All that said, I'm also not convinced the codebase bifurcation is a bad thing. Async ~= I/O. As a regular C# user, I'm not particularly unhappy about splitting my app into "I/O things" and "not I/O" things.

andygocke · on Aug 23, 2024

In practice we don't think there will end up being tradeoffs in async2 vs. async1. If you look below, at the "JIT state machine" section, you'll see that async2 looked better and the GC behavior differences were probably transient.

Overall, there are no architectural reasons why the compiler version should be better. The runtime should be able to make perf decisions that are at least as good in every case.

andygocke · on Aug 23, 2024

I'm not sure that the original description is precisely correct, but yours isn't correct either.

Basically, you can't treat green threads just like "a multi-threaded runtime" and have it just work. That is, a 1:1 mapping between green threads and OS threads is just OS threads.

So fundamentally if you bounce your green stacks off of the actual stack they're going to need to go somewhere... and that place must be the heap.

There are pluses and minuses to this implementation, but the biggest minus is that it makes FFI very complicated. C# has an extremely rich native-interop history (having historically been used to integrate closely with Windows C++ applications) and therefore this approach raised some serious challenges.

In some sense, async is the cost for clean interop with the C/system ABI. Transition across OS threads requires something like async.

cube2222 · on Aug 23, 2024

I meant that you can have a multi-threaded runtime that will be executing your green threads in a multi-threaded fashion. Like in Go you have (by default) as many worker OS threads as CPUs, and the Go runtime will take care of scheduling your green threads on those worker OS threads (+ create threads as needed for blocking syscalls if I remember correctly, but that's getting way to deep into the details). And this will, in fact, "just work" from the user's perspective.

And yes, as both you said, and I said at the end of my previous comment, the main hurdle of green threads imo is FFI, but it's not what the article mentions, which is what surprised me.

andygocke · on Aug 23, 2024

Ah, I see. You were saying that green threads can usually be scheduled on multiple os threads and take advantage of parallelism. Yup, I agree. Apologies for the confusion.

andygocke · on Aug 8, 2024

I’m the author of the original issue — I agree, we’ll have to ensure the struct layout is solved. I think the only thing that makes sense is to just waste a little space and store the fields side-by-side. In almost all cases where people would use a struct I think this is an acceptable tradeoff.

At the point where you have more than 5 cases, the GC overhead starts to get shrink in comparison to the calling convention and copying overhead anyway.

neonsunset · on Aug 8, 2024

Will there be any layout optimizations performed by Roslyn for fields that can be aliased? E.g.: for a type union which has variants with 2 object fields and one ushort-sized at most each, have a base layout of (object, object, short)?