Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> An important factor in reducing the size of the codebase and executable is that Jacobin relies on Go’s built-in memory management to perform garbage collection, and so it contains no GC code.

This breaks my brain thinking about it. A lot of what the JVM does is interpreting/JITing bytecode and ensuring it links/executes correctly, and writing that logic itself in Go is one thing. But how does Go's GC help you garbage collect objects in the JVM you're implementing?

For example, you have objects in the JVM heap, tracked by the code you're writing in Go. You need to do a GC run. How does the Go GC know about the objects you're managing in the JVM? Do you just... write wrapper objects in Go around each of them, and code up a destructor so that freeing the Go object frees the tracked JVM object? How do you inform the Go VM about which pointers exist to your JVM objects?

I realize I'm in way out of my depth here, and only have a "user"'s understanding of what GC's do, having never implemented one myself, but it seems crazy to me that Go's GC can Just Work with a VM you're writing in Go itself.



I suspect every JVM heap alloc is implemented by doing an alloc in Go. The JVM references to the object are pointers in the Go VM. So no special magic is needed. When the Go VM stops referencing an object, the Go GC will collect it.


Does this mean that code running inside Jacobin might be vulnerable to memory exhaustion issues[1], whereas in JVM they might have gotten an OutOfMemoryError instead because JVM heap size is fixed at startup time?

[1] For example https://pkg.go.dev/vuln/GO-2023-1704


Interesting idea. That sounds like this could run java programs with only the memory they actually need, instead of dividing a server up into pieces just because a java program "might" some day, possibly, maybe, ever need that much.

Which tends to be cargo culted into "use these arguments when running java programs", thus a "hello world" responder gets allocated 128GB of ram.


Java leans much more heavily on its GC than Go does so it will be interesting to see whether that's really an approach that works.


Not too familiar with Go, but my first instinct is how will it handle non-vanilla references, of the weak, soft, or phantom variety?


Given how primitive the Go GC is, I doubt it'll work at all.


I don't think primitive is a good description of the go gc. It's definitely got different design constraints (Vs the various java gc, for example), particularly around the philosophy of minimising knobs, but within those constraints it's pretty highly optimised.


It will work, bat since it’s not a moving GC you may end up with a lot of heap fragmentation, and as I don’t think it is generational it may get into a state where it stops collecting or has quite long pause times (can’t remember if it limits its pause times).


It is kind of interesting to look at some of the differences in GC approach in the JVM vs Go - the different goals, different tradeoffs, different approaches, etc. Go's is definitely simpler in that there is a single implementation, it doesn't have nearly as many tuning knobs, and it is focused on one things vs the JVM GC implementations that give you a lot of control (whether that is good or not..) over tuning knobs and it is a pretty explicit goal to support the different GC-related use cases (ie, low-latency vs long-running jobs where you only care about throughput).

One of the things I really like about Go is that a lot of the designs and decisions, along with their rationales, are pretty well documented. Here are the GC docs, for example - https://go.dev/doc/gc-guide.

For example, Go doesn't move data on the heap around so to combat fragmentation, it breaks the heap up into a bunch of different arenas based on fixed sizes. So a 900 byte object goes into the 1K arena, etc. This wastes some heap space, but saves the overhead and complexity with moving data around.



Arenas are still experimental (as of go v1.21).


Sorry, I used the wrong terminology. They are called “spans” in Go’s GC. There are different sizes of spans that allocations end up in, which helps avoid fragmentation.


It will "work". It won't be as fast and precise as the JVM.


As noted by someone in a sibling thread, it's possible it might yield better memory total allocation size, which if true could be interesting and worthwhile by itself as a tradeoff to consider on an app by app basis.


?

Go has a complete GC. It's not like Go relies on reference counting.

The biggest problem is that it's non-moving, so fragmentation is an issue. But that's true of many languages, e.g. C/C++.


A “complete”—i.e. functioning—tracing GC is a weekend project. (Mark-sweep, mark-compact, or stop-and-copy, take your pick.) Perhaps not as simple as basic unoptimized reference counting, but still not hard.

The hard part, the one that has occupied JVM engineers for almost three decades now, comes afterwards: when you try to make things not freeze when memory is low, or when you have multiple threads mutating the same heap, or ultimately when you’re adapting the GC to the particulars of your language. (E.g. Haskell has an awesome concurrent GC that’d work like crap for Java, because it assumes tons of really short-lived, really small garbage and almost no mutation. The other way around is also bound to be problematic—I don’t know how the Scala people do it.)

So a GC being tracing and not refcounting is not really a useful benchmark. And Go’s GC is undeniably less advanced than OpenJDK’s, simply because almost every other GC is. It can still suit Go’s purposes, but it does mean running Java on top of it is bound to yield interesting results.

(And can we please stop pretending C and C++ are in any way close as languages? Even if the latter reuses some parts from the former’s runtime.)


> Go’s GC is undeniably less advanced than OpenJDK’s

Java relies very heavily on its GC and tends to generate a lot more short lived objects which need collection than Go. Go's approach to memory management learns from this and focuses on creating fewer short-lived memory objects and providing much shorter GC pauses than Java. It's definitely less complex than Java's GC but it's also very performant and a lot less trouble than Java's GC in my experience.


Can you elaborate more on this?

> E.g. Haskell has an awesome concurrent GC that’d work like crap for Java, because it assumes tons of really short-lived, really small garbage and almost no mutation. The other way around is also bound to be problematic—I don’t know how the Scala people do it

I don't know a ton about Haskell's GC, but at surface level it seems very similar to several of the JVM GC implementations - a generational GC with a concept of a nursery. Java GC is very heavily designed around the weak generational hypothesis (ie, most objects don't live long) and very much optimizes for short-lived object lifecycles, so most GC implementations have at least a few nursery-type areas before anything gets to the main heap where GC is incredibly cheap, plus some stuff ends up getting allocated on the stack in some cases.

The only big difference is that in Haskell there are probably some optimizations you can do if most of your structures are immutable since nothing in an older generation can refer to something in the nursery. But it isn't super clear to me that alone makes a big enough difference?


One major simplification you can make is that due to purity, older values _never_ point to newer values. This means when doing generational GC, you don’t have to check for pointers from older generations into newer generations.


This feels wrong. Specifically, doesn't laziness bite you in this scenario? If I make a stream that is realized over GC runs, I would expect that a node in an old generation could point to a realized data element from a newer generation. Why not?


It does: "Nevertheless, implicit mutation of heap objects is rife at runtime, thanks to the implementation of lazy evaluation (which, we admit, is somewhat ironic)." says <https://www.microsoft.com/en-us/research/wp-content/uploads/...>


Sure, you're saying that it won't be as performant.

And that' true. IDK if you noticed, but there's no JIT either.

---

> can we please stop pretending C and C++ are in any way close?

If we also pretend we don't know why it was named C++.


> Sure, you're saying that it won't be as performant.

I mean, I expect it won’t be, but that wasn’t really my point, no.

What I wanted to say is thet I expect the comparison to be interesting: I might not find Go’s particular brand of simplicity attractive, but I like simple designs in general, and Go’s GC is much less involved than OpenJDK’s one while still having received some tuning—it’s neither a weekend toy nor a multi-programmer-century monster. And it’d be interesting to see how much the simpler design really loses to the scariest monster of them all.

> And that' true. IDK if you noticed, but there's no JIT either.

That might have been interesting in a general comparison of Java VMs, but I’m concerned with GCs and in that light it’s not. It could be that a slow VM is so much slower that the GC difference gets lost in the noise, but given an actually bad GC situation can lock up the mutator for literal seconds I expect there will be a meaningful comparison independent of the rest of the VMs.

>> can we please stop pretending C and C++ are in any way close?

> If we also pretend we don't know why it was named C++.

Marketing gimmick? I’m absolutely fine ignoring people who try to suggest things which are not true through manipulative branding. I don’t feel guilty about that.

To be clear, there absolutely is C-ish C++ in the world, and even if it’s not a lot relatively speaking it’s still a lot of code just because of how much C++ there is overall. And if C-ish code was the mainstream of the language, I’d be fine with this commingling. But it’s not, and neither is it the style the language’s designers are using as their benchmark. That’s been the case for at least a decade. So, no, I don’t think C/C++ is any more justified than, I don’t know, C/C#.

Finally, the name was chosen not only very early in C++ time but actually fairly early in C time as well. When C++ was named, C didn’t even have function prototypes! (Necessarily, as it copied those from C++.) I just don’t see why it matters what the Stroustrup’s intentions were when he chose the name in 1982. A lot has changed in forty years.


> it seems crazy to me that Go's GC can Just Work with a VM you're writing in Go itself.

Far from it, it is more natural to do that than anything else.

Simplified example:

  type Array struct {
    items *any[]
  }

  type Object struct {
    fields map[string]*any
  }
These are the JVM values, and when the references to them disappear, the JVM values they reference can be GC'd as well.


your example is not valid Go code:

syntax error: unexpected ], expected type argument list

and its also just poor style in general. "any" is already a pointer, so you would rarely design a pointer to any. example:

https://godocs.io/encoding/json#Marshal


It’s just pseudo code, relax. You’re not a compiler. You know what they meant.


[flagged]


This comment (and the subsequent follow-up reply) violates the HN comment guidelines:

- Be kind. Don't be snarky. Converse curiously; don't cross-examine. Edit out swipes.

- When disagreeing, please reply to the argument instead of calling names. "That is idiotic; 1 + 1 is 2, not 3" can be shortened to "1 + 1 is 2, not 3."


Lighten up. Someone answered a question in a concise manner with some pseudo code, probably in a language they haven’t context switched to in a couple weeks. The pedantry you’re displaying doesn’t help the conversation. Do you have substantive insight to provide about how the go GC works with Java’s different JVM improvements?

If not, you can just rate the post up or down


What of it, though? I noticed, but didn't think much of it because pseudocode doesn't need to be valid and usually isn't. What is the problem with these syntax errors that you are trying to address by calling attention to them?


Since the VM controls allocation of Java objects, just implement the VM to allocate the Java objects into Go's heap using Go's native allocator thereby allowing the native Go GC to clean those up when they become unreferenced.


How would one allocate Java objects using Go’s allocator, as a program written in Go? Does go provide such primitives?

Naively, something like:

    // Called when the hosted JVM code wants to allocate
    func makeJVMObject() (jObj, error) {
        var obj = new(jObj) // on go’s heap
        // do stuff
        return obj, nil
    }
would make sense, except how do we keep track of who’s referencing it? JVM objects have fields which tell the GC how to crawl the object tree in the mark phase (and so do Go objects), but how do we make the Go GC aware of the fields the JVM knows about? A map maybe?

Hmm, I guess a map could work… the jObj struct could have a map of fields it knows about, keys being the field name and values being where they point to…

Now that I think of it this probably must be how all GC’s work, they can’t rely on static information to know the fields of each type they’ve compiled, it’s gotta be something like a map somewhere.

I guess I may have answered my own question here.


Yep :)

In fact, I would go so far as to say that it's harder to implement those without Go's GC just working.

https://news.ycombinator.com/edit?id=37254746


Most languages have “precise garbage collectors” that always know from runtime type info which bits of an object are and are not heap pointers.

Sometimes you see add-on “conservative garbage collectors” that have to assume any word might be a pointer if it looks like an aligned address in an allocated page. They can’t move objects to do compaction because they’re never sure which words are not pointers.

Jacobin stores an object with a slice of its field values (each boxed as “any”) and types, so Go’s precise GC would be able to trace them: https://github.com/platypusguy/jacobin/blob/main/src/object/...


Leveraging the millions of man hours that goes into these run-time's subsystems is starting to become a "thing" I've noticed -- especially when running code not meant for them. For example, there's a Nintendo Switch emulator that I believe just uses the C# runtime's JIT instead of trying to roll their own. Lo and behold, it works and they've saved themselves thousands upon thousands of hours writing and debugging their own.

It's kind of cool actually.

I wonder if there's a future where somebody can just pick and choose language and runtime components parts to create the environment they want before even writing a line of code. We sort of do it a level lower with VMs and containers, and then pick and choose language features we want to use (e.g. C++), but I don't know of a good way to use Java's JVM, C#'s JIT, somebody else's memory profiler, another team's virtual memory subsystem etc. without writing a bunch of different pieces in different languages to get those benefits.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: