Reading the CMU article, how do you actually make it fast? For each frame you need to collect all objects in the scene (for ray traced scenes anything that can be reflected), split them into micropolygons (either from a triangle mesh or from parametric meshes) and then render all this at e.g. 4K. Each of these steps is extremely demanding.
You do as much of that offline as possible. The "split them into micropolygons" happens at editor time, along with a search tree that makes it really easy to find them.
It's still very demanding! But the goal is that you bake as much of it that you can.
I suppose it depends on what you define "editor time" to be. Importing an asset into a scene live appears to be pretty seamless, at least in comparison to preprocessing stages of yore.
Segmentation in CPU is pretty common as there's huge opportunity for caching. It's possible to push segmentation into GPU as an optimization but it might not be worth it if the GPU is busy enough with downstream operations. Even then you still cache the results.
I would be surprised if they don't have a dynamic pipeline that can be optimized at runtime.
The segmentation happens during importing assets into the editor. 50% of the talk on Nanite is concerned with their new compression system for geometry, which enables seemless segmentation of meshes due to some clever graph theory trickery.