Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Implementing a class with void* (utk.edu)
62 points by azhenley on Aug 30, 2021 | hide | past | favorite | 71 comments


I'm surprised he doesn't refer to it by name (Pimpl). In C++ it has the advantage that you can change your implementation and retain binary compatibility because the size of your class doesn't change as you add/remove member variables (and maybe more surprisingly, add/removing virtual functions). If you're going to use C++, seems like it would be better to avoid the void* by

  class C {
  protected:
    struct Impl;
    std::unique_ptr<Impl> mImpl;
  };
And then define C::Impl in the .cpp file.


Came here to say exactly that. You can also do this, thought it looks a little worse it allows for even more flexibility, such as complete decoupling of Impl from A across different files:

  // Forward declaration of Impl. What does Impl do?
  // You're not allowed to know.
  class Impl;

  class A {
  protected:
    Impl* mImpl;
  };
I've been using this trick but for a different reason - to reduce the number of #include statements in header files that are included a lot themselves.


Between this trick and generous use of forward declarations, I was able to remove almost all "headers included from headers" from a past project of mine, speeding up compilation by probably 3X (never measured it but it was observably faster). Maybe today's compilers optimize all these includes away and it doesn't make a difference anymore but it used to.

These are the kinds of refactorings you often need to justify with hours of arguments and approvals and religious fights and code reviews at work, but can do in an afternoon in a private project just because it makes the code nicer.


Might not make much of a difference with precomputed headers, but if you can't or don't use those, that should still provide quite a bit of speed-up. I went through the same exercises back when I did C++, but couldn't use PCH because some dependency did weird things and didn't exactly work anymore.

These days I wish there was something similarly easy to speed up Webpack builds ...


I had 2x and I wasn't done... and this was a C with classes style project, so minimal templates and header complexity..

A much better alternative than precompiled headers.


Yes, but doesn't std::unique_ptr introduce compiler / toolchain constraints? I.e. if the class declaration was in a header file of a library, as a user of the library you'd be bound to the same toolchain as the library, no?


Considering this is C++ rather than C and it doesn't have a portable ABI to begin with officially you'd have to use the same toolchain regardless.

In practice you get a strong amount of compatibility between Clang and GCC at the compiler level (since Clang basically copies GCC's ABI). But you also get the same fudging at the stdlib level with std::unique_ptr because it's header-only and is a zero overhead abstraction for a raw pointer (i.e. at the ABI level a normal unique_ptr is the same as a pointer except you can't move it around in a register).


Neither does C, most devs mix the OS ABI with the programming language used to implement the OS.


> and maybe more surprisingly, add/removing virtual functions

I don't have the spec handy but fairly certain that v-table implementation is compiler specific and while it may work it isn't guaranteed. However if you declare the function symbols as dynamic then you can leverage the linker to dynamically resolve the right symbols with the matching opaque data and achieve binary compatibility(assuming you use the same compiler or a compatible compiler ABI and all other caveats around C++ binary compatibility).


vtable layout is defined by the ABI, which is (mostly) consistent across major compilers everywhere except MSVC, however if MSVC ever broke vtable layout then everything that relies on COM would break on Windows. Which is basically all of Windows user space.


COM is an OS ABI, so all Windows programming languages do need to speak it, not only MSVC.


> ...you can change your implementation and retain binary compatibility...

Maybe? I can still think of ways to have ABI problems in the implementation of class C.

It's true that there are fewer ABI problems to worry about, though.


> Maybe? I can still think of ways to have ABI problems in the implementation of class C.

Yes which is why GP is speaking in terms of allowance. Using this pattern you can retain ABI compatibility, that doesn’t mean you do and it’s otherwise a free for all.


You're reading a different inflection into "can" than I did, but it sounds like we agree that pimpl isn't sufficient for ABI stability.


It's kind of ironic that you declared it protected given that the poor subclasses won't actually have a definition to work with... which is incidentally one of the reasons not to do this.


> It's kind of ironic that you declared it protected given that the poor subclasses won't actually have a definition to work with... which is incidentally one of the reasons not to do this.

i see you haven't heard of the occult powers of going to peek into the source file, copying the impl struct definition in your own source and going for a big bad reinterpret_cast


The usual solution is to out Impl in a separate implementation header, so that implementation headers of derived classes can include it, but the rest of the world doesn't need to see it.


I mean, yes, but that's just the start of it. Suddenly you need to add contortions (like, say, 2-phase initialization) if you need to e.g. add virtual methods to Impl. You can keep adding workarounds after workarounds until everything works; my general point is just that supporting subclasses now becomes more painful, and you end up having to (in some sense) fight the language. It's nowhere near as free of a lunch as people make it out to be.


I don't see why you need 2-phase for virtual functions.

But yes, I don't think anybody claims is a free lunch. It is annoying, verbose, repetitive, but often necessary to keep the cose base complexity under check.


> I don't see why you need 2-phase for virtual functions.

I might be misremembering the "for virtual functions" part, but the inability of C to call Impl's constructor directly in the presence of a derived class can sometimes force you to delay some of the initialization until after Impl is constructed. ("Necessary" might be too strong here, in that you could find some other workaround too.)

> often necessary to keep the cose base complexity under check.

Hmm... I'm not sure I agree. A pimpl-like idiom can be necessary for solving a few very specific problems, which are explained in [1] better than I can here: (a) ABI stability, (b) slow compilation, and (c) exception safety. There are other niche cases I can think of (e.g. "I need fast/atomic swapping like a reference type, but copying like a value type"), and even those might have better solutions, but none of them really has anything to do with code complexity... they're either domain requirements you either have or don't have, like in (a)/(c)/(d), or they're workarounds for slow toolchains, as in (b). But unless you have requirements/constraints like these, I have a hard time recalling any common situations where pimpl would be the best solution, especially if it's for taming complexity.

[1] https://softwareengineering.stackexchange.com/a/213264


Yep, exactly what you would do with an opaque struct in C, too.


The downside of unique_ptr used this way is that you have to define ~C in the same translation unit that the declaration of C::Impl lives.


You can use a custom deleter, e.g.

    struct deleter_t { void operator()(impl_t *); };
    using impl_unique_ptr_t = std::unique_ptr<impl_t, deleter_t>;

You can place the implementation of the deleter alongside the impl.


That breaks make_unique. Maybe unique_ptr inside an std::any? Error-prone casting, but managed lifetime. Though make_unique is maybe obsolete now that parameter evaluation order is better defined.


You have to do that anyway, unique_ptr only makes forgetting that a compile error.


This also significantly hurts readability, and I would hate the person who writes code like this unless you really need binary compatibility (if you do, it is a sign that you should improve your release schedule).

Mostly, you just want to decouple the interface of the class from the implementation details.

In that case do just that - define an interface and implement it elsewhere.

You can always just static link all binaries.


> You can always just static link all binaries.

Not always you can't. There are many reasons to use dynamically linked libraries all of which are applicable whether you're using PIMPL or not.


> Not always you can't

What's an example of this (assuming you don't care about binary size)?


You care about binary size is probably the first one. In my experience, increases in binary size cause an exponential increase in link time. You may have licensing restrictions (LGPL for example), or be using a binary thay was provided to you. Your source code might be set up in such a way that a monolithic build doesn't make sense.


> This is not an "industry-standard" way to program in C++.

It absolutely is a de facto industry standard way to program C++, and has a name: PIMPL (Pointer to IMPLementation). It has that name, because it's famous.

It's probably less fashionable in newer code bases; probably someone whose head is up in C++20 will probably scoff at this, and certainly at any version where the secret is hidden by void *.

It provides a good way to wrap C API's in C++.

And, speaking of that, the technique is basically the spiritual equivalent of what happens in many a FFI module in languages other than C++ too, where some C handle is represented as an opaque foreign pointer, which is wrapped in some object native to the language.


It is not an industry standard way. It is not a sensible way. It is not even a C way.

As others have said, there's no reason whatsoever to use void* here. Just declare the struct without defining it in the header. Then only define it in the C file. C is fine with pointers to structs which are merely declared but not defined.


It is a de facto standard in that it is fairly widely documented and actually done. It's not in any ISO or IEEE standard or anything of the sort.

There is almost no reason whatsoever to use void * anywhere other than to write a declaration that is compatible with another one which uses it.

(A pointer to any object is better implemented as a typedef for unsigned char *. This requires a cast in both directions, thus it is safer. At the same time, it is more convenient when you actually want to work with the memory as such: you have bytewise arithmetic and dereferencing.)

> C is fine with pointers to structs which are merely declared but not defined.

Particularly if the secret object is a struct/class then this certainly improves the code (and when the secret isn't such a thing, it can probably be made into one: e.g. a secret array of integers can probably just be a a struct containing an array, possibly a flexible one). Numerous unsafe casts are thereby eliminated.

But it makes no material difference to its organization or semantics. It's still the same PIMPL pattern.


It (using a void pointer) is a worse way of doing PIMPL. Give the class a name, but don't expose its implementation (use an opaque class). This means that within the file or files that have access to the full implementation the correct type is present and no casts are needed.

Well, what if the actual class is a template? No matter, you can derive a class from a template, as in:

header: class Impl;

implementation:

class Impl: public std::vector<SomeType> { ... };

People who don't do this and use void* wind up with an implementation that has lots of casts in it, and it's easier to introduce bugs, especially once you get to the point where you have two or more hidden implementations.


Used to teach C++ - doing it this way is absolutely _not_ how to do or use the PIMPL idiom [0].

Declaring a private and opaque forward decl is very different to VOID* all the things.

[0] https://en.cppreference.com/w/cpp/language/pimpl


The term ”PIMPL” itself (by far) predates all the cruft shown there.


Qt uses pimpl extensively to maintain binary compatibility: https://wiki.qt.io/D-Pointer


I haven't thought about him in years, but James Plank is the reason I'm a programmer today. I was a terrible student and he likely doesn't remember me, but his classes were too much fun. He just assigned projects and a timeline and live coded similar exercises in class mostly along with the timeline. I probably would have switched away from Computer Engineering (a degree I barely use) without his classes.


It seems kind of stupid to use "-std=c++98" when building this stuff. Nothing in later Standards interferes with it.

Maybe it is meant to serve notice that the method is archaic, and that better ways to achieve the same thing will be presented later. But then one might just as well code the throwaway examples in C and use a damn C compiler. It's archaic even for C. There is just no merit in storing a void* that will then be cast to some arbitrary other known type every time it is used.

The only really valid use for a void pointer is when it is being passed through a subsystem and comes back to somebody who knows what it is, such as when an abstract handler function is being registered along with a pointer to some context that will be passed to the handler; this is extremely common in C code.

In C++, you can just take a pointer to a type T with virtual member f() you promise to call. The caller supplies a pointer to T2 derived from T and with its own f(). This, too, feels a bit archaic, but is at least not actively silly.


Back in the 1990s, the video game company I worked at would hang a couple of extra void pointers in every class just to, you know, store extra things that are needed as time goes on…


This was common in the 90s, BeOS had these everywhere in its API classes


This is because of the fragile ABI problem in C++--it's not good for APIs for this reason. I used to program for the macOS kernel and they (used to) do exactly the same thing.

See `OSMetaClassDefineReservedUnused` in `OSObject.cpp`, here: https://github.com/apple/darwin-xnu/blob/main/libkern/c++/OS...


This is a well-known way to achieve separation of definition and implementation.

The disadvantage of the method proposed in the article is that, now everything needs a pointer indirection.

A solution that fixes the drawback is used by lz4's library implementation: instead of storing a void star, store a char[] array of same size as the real struct. (of course, now you have to manually make sure the struct size are in sync. It's more error prone, but still not that bad).


Or don't do that (the char array trick) in C++ because implementers and standardizers are not clear about when/if you are allowed to store other objects in an array of chars, and even if you are it is tricky because you need to manually manage the alignment, or they are attempting to replace char with std::byte in the long term but don't really have a comprehensive and detailed plan to do so, etc.

The implementation should probably provide you with std::aligned_storage which may involve some magic to handle some of those concerns, although merely the easiest ones, so I would say probably still don't use that either, unless you are already quite a C++ expert and/or are prepared to dig into the standard with no clear response about what you are attempting to do is even formally possible (the implementers/standardizers do not even know some things they make impossible for quite a long time, see for example the insanity of std::launder, or if you want to loose your mind forever the semantic of pointer provenance analysis that compilers are maybe already using to "optimize" but that what the semantic should even be is still being debated.)


This is incorrect. I participated in the committee discussions around launder, byte, pointer provenance, and implicit object creation. None of those issues show up here. This is a very simple case of using placement new into a properly aligned char buffer to create an object at that location. This has worked just fine since C++98 and is not impacted by the many other object/lifetime/pointer issues that are being discussed.

Additionally, implementers are in completely agreement here that this works. There are zero standardization/implementation concerns with this method, and I would highly advise against scaring users away from it when necessary.


Thanks for clarification. It's nice to know that this approach is fully standard compliant.

(and that's exactly my point: the standard is hard to access for normal devs like me, so I can guess "some reinterpret_cast hack seems ok", but never be 100% certain until some standard expert confirms)


Yes, strict aliasing (or type-based alias analysis?) is quite crazy, and there are some murky dark corners where the specification is different between C and C++..

I think they have a std::launder thing exactly for this purpose of "safely casting an array of bytes into an object".

However, in this particular case (of using char array to hide real implementation), the implementation resides in another translation unit, so I don't think anything is going to break if LTO is not enabled. With LTO I have no idea..


You don't need to cast anything - just placement-new it inside the array. So long as the array is properly aligned, this is fine. After that, it would be UB to peek at the bytes of the object via the array because of aliasing issues, but I don't see why it would be improper to use the pointer returned by new.


After the object is constructed by placement-new, the class methods still needs to reinterpret_cast the char array to an object pointer to access the object.

I don't think in this specific case there is an UB involved, but I'm not language standard lawyer so I'm not sure. I feel the standard's specification on what is allowed to reinterpret_cast and what isn't is arcane (or at least far from straightforward to understand).


You will get the properly typed pointer to object from new, so if you want to play it completely safe wrt UB, it can be stashed away alongside the array in the public class; this adds sizeof(T*) to the latter, but avoids casts entirely.

But, yes, you do technically need std::launder to get it directly via the array.


Not too terrible to create a private method that does the reinterpret_cast for you.

I've had to use this technique in the past and therefore dived into the standard for quite a while. I don't recall encountering any UB concerns.


> After that, it would be UB to peek at the bytes of the object via the array because of aliasing issues

Do you have a source for this? IIRC char and std::byte have a specific aliasing exception. I.e. char* and std::byte are allowed to alias anything.

You obviously aren't allowed to modify the char or std::byte array (because that would violate the struct/class's aliasing rule).


I'm not so sure anymore, after looking at the standard again. It says that it's okay to access any object via a glvalue of type char, unsigned char, or std::byte; note that this does not include arrays of the same. If you have a field of array type, and you subscript it, the expression does involve a glvalue of that array type as one of the operands - but does this constitute "access"?


A little late, but yes, subscripting an array yields a gvalue of that type, but that is not an access unless you use the glvalue in some other computation (I assume). But even if you do access it (i.e. make a bitwise copy of the entire array) that should still not be UB.


For projects of mixed quality without basically an unbounded workforce maintaining them (who could investigate rare/arcane bugs "introduced" by the "optimizers" in some builds), and/or using "tricks", I too am fond of not using LTO.

But then I force myself to find a second reason for why the program will run correctly, and unfortunately nowadays it is more and more being strictly-conforming. Relearning std::launder, TBAA, pointer provenance, etc. every time is way too consuming. I'm forced to give-up on programmer optimization and hope for the compiler to be really up to its mythical promises (and this yet: without LTO; too dangerous...)


> of course, now you have to manually make sure the struct size are in sync.

Can't you just use a static assert in the implementation file?


Why not `typedef struct state state_t` in the header file, then have a `state_t state` in the header file with the `struct state { ... }` definition in the source file? This is similar to what I do in C, I don't see why it wouldn't work in C++.


That would fix the type safety issue of using a void* but not the extra pointer indirection issue, since your state struct would still have to be a pointer if you want the declaration outside of the header.


The module that's importing the header needs to know the state size. To do that, it either needs to see the struct declaration or be given the explicit size with the array approach.


That breaks the ABI-compatibility part of this trick.


Yes, it's a tradeoff. The upside is now you can put the object on the stack without invoking memory allocators.


Yes, as others have said, using an opaque class is preferable to using a void pointer.

class Impl;

This isn't just for hiding, it also gives you faster compilation.


> This isn't just for hiding, it also gives you faster compilation.

That seems unlikely. What are you basing this on?


It means that the header that declares class A doesn't have to #include the header that declares class B, unless B is a part of its public interface. It doesn't sound like much, but the implementation dependency chain can be much longer in practice; and more importantly, all those savings apply to every translation unit that includes the header.


Ah, that's what you meant. Your previous comment reads as if an opaque class results in faster compilation than a void pointer, but those are the same as far as that goes, and you were comparing both to a non-pimpl version.


There might be a few hundred CPU cycles saved in compilation from not having to static cast the void* pointer to the implementation class.


Haha, sure, and the other way around, time is spent constructing the compile-time type info for the pimpl class. Let's ignore both since the impact would be so small it would almost certainly be drowned out by all the random noise that happens all the time.


Yes, I was comparing

class Foo;

to

#include "Foo.h"

and I would not consider using a void pointer for anything other than an allocator.


Besides what everyone else was written regarding pimpl, with c++ modules the only reason to use PIMPL is to ensure ABI stability.


Consider the rule of zero.


Interesting thing indeed. Would you care to expand?

Edit: This is interesting from the case of "beginning of the calendar" - coincidentally just read this from a reddit thread (linked):

> Let me start with a quote you may hear in a lot of history classrooms: "Jesus Christ has been born 7 years Before Christ", sounds a bit weird doesn't it. Yep, historically the idea was to mark his birth as a dividing point, however, there are a bit of problem of determining precisely what year we're trying to set as first one. To keep the meaning intact, we would have to move dates if we find better information about exact year of his birth, not convenient at all.

> When we say that we are in year 2012 CE, then we don't care that year 1 CE should be some specific event (there isn't year 0 in common notation BTW), we just need all agree on the same starting point. We leave the date intact because it's widespread, so it's more convenient not to change the date, the same way we still use non-decimal hours, minutes, seconds. If f.e. Anno Mundi (from Creation of the World) system remained in use in Europe with the agreement on the same date (Latin and Greek scholars disagreed on Biblical age of Earth), we would use it as Common Era.

https://www.reddit.com/r/TrueAtheism/comments/14nuuw/i_think...

This is interesting to me from a wild out there idea that feels similar to 'coordinate space for storing data.' Time as a coordinate space for simulations in a way.





Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: