Opaque pointer (aka d-pointer or pimpl) is a great C++ design pattern useful for prolongated binary interface compatibility, properly hidden implementation and faster compilation. However, it has inherent performance drawback, which could get pretty critical if you care about efficiency. In this post I propose an approach that makes d-pointers less binary compatible but swipes away its inefficiency.
Most of the time, in C++ you don’t really hide members of your class as they are visible (even though not accessible) to all the users of this class. This means that any interface changes, like new private member function or renamed member variable, are propagated to all units that depend on this interface. In practice, this denies backward compatibility and leads to awful (re)compilation times.
This issue can be gracefully resolved with d-pointers. Examples are always
better so let’s consider some class named
Clam, which doesn’t want to expose
its data and internal functions. The declaration
1 2 3 4 5 6 7 8 9
Here, we declared that
Clam has a member, that is a pointer to some incomplete class
from now on it is possible to declare any members in
Self really private so that they
won’t be visible anywhere but in the
Clam implementation scope (note that we have to avoid default
constructor and destructor as they would ask us to provide the complete definition of
we don’t want it). The source file
.cpp would be:
1 2 3 4 5 6 7 8 9 10 11 12 13
As I noted before, this allows you to alter the
Clam class with no changes
to the header — we just modify the
Self thing and its handling in
Such an approach does a great good for everyone:
This speeds up compilation: a) we don’t need to recompile classes that depend on
Clamwith any changes taken on it b) members of
Clamare now totally hidden and not propagated to its users. This is important for large projects with a great number of classes (like Qt, that uses it extensively).
This introduces binary compatibility: we can alter implementation
Clamwith no changes to its binary interface so we can expose the class and make the implementation interchangeable (e.g. different versions are still binary compatible).
The aforementioned drawback is pretty clear — a level of indirection makes any
member access or member function call be done through a pointer. Furthermore,
Self instance is allocated somewhere on heap — this leads to unnecessary
I think I have (more or less) a sane solution that removes
this overhead (I haven’t found anything on that but Fast Pimpl, which is
a bit different). Let me call it
EmbeddedDPtr. The naming comes
from the idea of embedding the object to some buffer of fixed size. This way
we avoid any heap allocations and employ caching in its best. But how do we
know size if we want to keep the declaration of
We can’t know this
sizeof. Although, what we can do is to provide enough space
and check if it is really enough to keep the object in the
static_assert. The implementation of this idea is the following code:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
Unfortunately, this approach restricts the binary compatibility as
Self should still fit to the buffer. Nevertheless, the compilation
speedup and cleaner headers are still here. The need to maintain the storage
size is indeed boring but the
static_assert keeps you safe from memory errors.
So, as usual, that’s a trade-off.
The other (good) side of this trade-off is performance. To check whether
EmbeddedDPtr is faster
than usual d-pointer I implemented a simple benchmark.
It consists of two classes:
1 2 3 4 5 6 7 8
1 2 3 4 5 6 7 8
WithUniqueDPtr::Self private classes do the
same simple thing — they have a counter they increase each time the
is called. This is probably the simplest (but not optimized) operation that could give
us a good estimate on the overhead. The
WithEmbeddedDPtr::Self looks like:
1 2 3 4 5 6 7 8 9 10
Finally, everything is ready for the benchmark cases. They are pretty simple:
1 2 3 4 5 6 7 8 9 10 11 12 13 14
I’ve ran the benchmarks on my machine and got the following output:
1 2 3 4 5 6 7 8 9
This ~3x speedup could be pretty significant in some cases. The speedup is quite easy to explain with just two facts: the first is that embedded d-pointer avoids heap allocations and the second, such code is much more cache-friendly with the implementation being located inside of the main class.
I am still thinking about possible approaches to ease the pain of the need to provide the exact size of the buffer. Although, that’s pretty clear that we can’t make it dynamic enough to be really easy to use.