söndag 17 oktober 2010

2 reasons for emebedded devs to use a C++0x-enabled compiler

C++0x introduces a lot of really neat things that any developer have use of (many that come from Boost). For example, proper auto-pointers (std::shared_ptr and std::weak_ptr), a new std::function that plays well with std::bind, and the auto keyword for automatic typing at initialization, are all applicable in most C++projects.

But many embedded programmers don't feel to strongly for using "modern" C++ features (like templates) or the STL library, due to concerns over code bloat or performance issues (Microsoft's earlier implementation of STL had some issues with performance, for example in std::string).

However, there are at least two reasons for embedded devs to start looking at C++0x when thinking code size/performance optimization.

Rvalue references (link)
C++ is generally overly fond of copying data, even at times it's not strictly nessecary. To mitigate this, C++0x adds the rvalue reference, denoted by T&&, which enables move sematics.

For example, if you return a std::vector<int> from a function in C++03, a temporary object will have to be created and deleted. This will result in a deep copy of an int array which is used by std::vector internally. In C++0x, by implementing a move-constructor, std::vector can copy only the pointer to the internal C array, in cases where the rvalue (the temporary) is not needed, This is enabled by declaring the return type of your function as std::vector&&.

In some cases, this can lead to a big performance boost, as new and delete are typically expensive on embedded systems. This guy found that using rvalues when sorting vectors containing PODs could double the performance.

Of course, C++0x versions of STL will implement rvalue references and move-constructors where applicable, like in std::vector.

Extern templates (link)
Okay, even though this was not in the standard, many compiler vendors already implemented this in C++03. However the feature was not so well-known (at least not to me).

Consider you have many translation units, like multiple libs. It is not uncommon to instantiate some identical templates in more than one lib. There is no guarantee that the compiler will optimize away the redundant instantiation, which may cause code bloat and longer compile time.

By declaring the template extern:

extern template std::vector<int>;

you tell the compiler not to instantiate the template in this translation unit. Obviously, this requires that the template is instantiated somewhere else.

fredag 1 oktober 2010

Virtual functions in C++

I recently attended Scott Meyers' great seminar "C++ in an Embedded Environment". We attendants came from a wide variety of disiplinces; from 8-bit AVR programmers to people (like me) sitting on 800+ MHz beasts with embedded 3D accelerators. Scott's point was basically that everything you can do in C, you can also do in C++ without getting any significantly reduced performance or code bloat, as long as you do it right.

One of the subjects he examined closely was virtual functions. I have heard people countless times telling me how expensive (in terms of speed) it is to call virtual functions in C++, and Scott basically killed that myth. Because I have found few good descriptions on how compilers implement virtual functions, I think it deserves an entry.

Consider this class:

class MyClass {
public:
virtual void func1();
void func1();
virtual void func3();
};

The compiler will number each virtual functions in order, so:
  • func1 will have the number 0, and
  • func3 will have the number 1.
Now MyClass will create a vtable, which is basically an array where the nth element will contain a function pointer, pointing to the virtual function with the number n. Such as:
A derived class may look like this:

class Derived : public MyClass {
public:
void func1();
void func4();
};

It overrides the virtual function func1 and declares a new function, func4. This class will contain its own vtable, like this:
Element 0 in the table has been changed to contain a pointer to Derived's variant of func1, while element 1 will still point to the base class func3. Note that building these tables is something that happens at compile-time.

Now every object instance of MyClass and Derived will contain a vptr (virtual table pointer), pointing to its vtable. The vptrs are set up at construction of the objects, and are typically stored first in the object, memory-wise.

So if we now call func1 in an instance of Derived, we will follow the vptr in the instance and look for the function address at element 0 in Deriveds vtable, and call this function.

Thus the only speed penalty we get is one exta indirection when looking for the function to call. Even if we would have declared func1 virtual in Derived as well and overridden it in yet another class (say DerivedDerived), the 0th element in the vtable would just be replaced with a pointer to DerivedDeriveds version of func1. So there is never more than 1 extra indirection no matter how deep the class hirearchy is.

There is also some size penalty for storing the vtable and vptrs. In this case, we would have 1 vtable for MyClass and one for Derived (8 bytes per class, as there are two function pointers in the vtable), plus a 4 byte vptr for each instance of the classes. Provided the class does something useful, this is probably a very small percentage of the total size of the classes.

To summarize, implementing and calling virtual functions in C++ is really cheap.