söndag 18 mars 2012

Bilinear interpolation for height approximation

I've been playing around some with 3D graphics on Android using the awesome libgdx library. Its more or less a platform independent OpenGL wrapper with some nice extra features, for example an .obj file loader. Its so nice to build and test in an x86 environment, and not have to use the terribly slow Android emulator.

So, I needed some way to get my models to move smoothly over my terrain. This is something I guess most 3D programmers have encountered, and there are many ways to do this. It comes down to a tradeoff between speed and accuracy, and really smooth height tracing is not inexpensive, especially if you have many objects moving around.

Normally the problem is that you have a low-res heightmap, typically one pixel per world unit. So we need to calculate the height if we for example are at (x,z) (17.3f, 33.8f), that is, somewhere between the coordinates (17,33), (18, 33), (17, 34) and (18,34), for which we have height information.

For my case I chose the popular and fairly calculation-inexpensive bilinear interpolation. It comes down to letting the 4 corners with known height form a plane, and approximating the value somewhere on this plane by analyzing one dimension at a time. Here is the code.

private dobule getHeightAtPos(Vector3 pos) {
// bilinear interpolation

float fract_x = pos.x - (int) pos.x; // decimal position in x
float fract_z = pos.z - (int) pos.z; // decimal position in y
// create a map with the four known heights we are somewhere in between, as follows
// (o is our position between the points):
// ...........z
// ...........|
// (1,1) (1,0)|
// ......o....|
// (0,1) (0,0)|
// x ----------

float h_00 = heightmap.get((int)pos.x, (int)pos.z);
float h_10 = heightmap.get((int)pos.x, (int)Math.ceil(pos.z));
float h_01 = heightmap.get((int)Math.ceil(pos.x), (int)pos.z);
float h_11 = heightmap.get((int)Math.ceil(pos.x), (int)Math.ceil(pos.z));

// Creates two parallell heightlines along x at the sides of the square which corresponds to the plane slope.
// Then calculate the height at our position between the points
float proj_x_near = h_00 + (h_01 - h_00)*fract_x; // the point on the line along x on lower z side
float proj_x_far = h_10 + (h_11 - h_10)*fract_x; // the point on the line along x on higher z side

// Now draw a heightline crossing the two points calculated above and find the height for our z position
// The resulting height is the approximate height we were looking for
float cross = proj_x_near + (proj_x_far - proj_x_near)*fract_z;
return heightMapToWorldScale(cross);

onsdag 18 maj 2011

Why I've stopped liking C++ algorithms

I have usually been an advocate for using STL-style algorithms in C++ for as much as possible when working with containers. And if you read C++ books and listen to the gurus, they promote them constantly. However, some days ago a friend asked me about some fairly simple code I had written using a find_if algorithm (making use of C++0x):

auto it = find_if(m_tags.begin(), m_tags.end(), bind(&tag_data::find_by_name, placeholders::_1, name_to_search));

And the function (which must be a member function in this case):

bool tag_data::find_by_name(const std::string& name)
{
if (name == m_name)
return true;
else
return false;
}

Now, an algorithm doesn't often get shorter than this. And still my friend (fairly new to C++) had a hard time grasping what was going on, which I can understand.

And I realized that in many ways algorithms do break two of the most fundamental programming design rules:

Write once, read many times.
Code is written once, but will be read multiple times, sometimes by colleauges who do not use C++ as their primary language. And lets face it, even simple algorithms are often harder to understand than a good-old for-loop. I thought so when I was new to C++ and I think that goes for most people (be honest now). I also think that the time you save writing algorithm-style over for-loop style is quite minimal.

Expose as little of your internals as possibly.
To avoid polluting your scope, you should always declare members as local and as private as possible.

However, a problem with C++ is that there are no real private members as in some other languages. Even though private members are inaccessible from the outside, they are still visible from the outside, which is a bit of a problem today when most people uses code completion tool extensively.

In the example above, I only use find_by_name in one place in the code, so I would've liked to make it not only private, but local to the caller function. This has not been possible before lambas was introduced in C++0x. And still the problem is that many people don't use lambas and it will probably take a long time until we see them adopted by the masses. Bottom line: Algorithms risks polluting your scope if you dont use fancy new (still experimental) C++ stuff.

söndag 17 oktober 2010

2 reasons for emebedded devs to use a C++0x-enabled compiler

C++0x introduces a lot of really neat things that any developer have use of (many that come from Boost). For example, proper auto-pointers (std::shared_ptr and std::weak_ptr), a new std::function that plays well with std::bind, and the auto keyword for automatic typing at initialization, are all applicable in most C++projects.

But many embedded programmers don't feel to strongly for using "modern" C++ features (like templates) or the STL library, due to concerns over code bloat or performance issues (Microsoft's earlier implementation of STL had some issues with performance, for example in std::string).

However, there are at least two reasons for embedded devs to start looking at C++0x when thinking code size/performance optimization.

Rvalue references (link)
C++ is generally overly fond of copying data, even at times it's not strictly nessecary. To mitigate this, C++0x adds the rvalue reference, denoted by T&&, which enables move sematics.

For example, if you return a std::vector<int> from a function in C++03, a temporary object will have to be created and deleted. This will result in a deep copy of an int array which is used by std::vector internally. In C++0x, by implementing a move-constructor, std::vector can copy only the pointer to the internal C array, in cases where the rvalue (the temporary) is not needed, This is enabled by declaring the return type of your function as std::vector&&.

In some cases, this can lead to a big performance boost, as new and delete are typically expensive on embedded systems. This guy found that using rvalues when sorting vectors containing PODs could double the performance.

Of course, C++0x versions of STL will implement rvalue references and move-constructors where applicable, like in std::vector.

Extern templates (link)
Okay, even though this was not in the standard, many compiler vendors already implemented this in C++03. However the feature was not so well-known (at least not to me).

Consider you have many translation units, like multiple libs. It is not uncommon to instantiate some identical templates in more than one lib. There is no guarantee that the compiler will optimize away the redundant instantiation, which may cause code bloat and longer compile time.

By declaring the template extern:

extern template std::vector<int>;

you tell the compiler not to instantiate the template in this translation unit. Obviously, this requires that the template is instantiated somewhere else.

fredag 1 oktober 2010

Virtual functions in C++

I recently attended Scott Meyers' great seminar "C++ in an Embedded Environment". We attendants came from a wide variety of disiplinces; from 8-bit AVR programmers to people (like me) sitting on 800+ MHz beasts with embedded 3D accelerators. Scott's point was basically that everything you can do in C, you can also do in C++ without getting any significantly reduced performance or code bloat, as long as you do it right.

One of the subjects he examined closely was virtual functions. I have heard people countless times telling me how expensive (in terms of speed) it is to call virtual functions in C++, and Scott basically killed that myth. Because I have found few good descriptions on how compilers implement virtual functions, I think it deserves an entry.

Consider this class:

class MyClass {
public:
virtual void func1();
void func1();
virtual void func3();
};

The compiler will number each virtual functions in order, so:
  • func1 will have the number 0, and
  • func3 will have the number 1.
Now MyClass will create a vtable, which is basically an array where the nth element will contain a function pointer, pointing to the virtual function with the number n. Such as:
A derived class may look like this:

class Derived : public MyClass {
public:
void func1();
void func4();
};

It overrides the virtual function func1 and declares a new function, func4. This class will contain its own vtable, like this:
Element 0 in the table has been changed to contain a pointer to Derived's variant of func1, while element 1 will still point to the base class func3. Note that building these tables is something that happens at compile-time.

Now every object instance of MyClass and Derived will contain a vptr (virtual table pointer), pointing to its vtable. The vptrs are set up at construction of the objects, and are typically stored first in the object, memory-wise.

So if we now call func1 in an instance of Derived, we will follow the vptr in the instance and look for the function address at element 0 in Deriveds vtable, and call this function.

Thus the only speed penalty we get is one exta indirection when looking for the function to call. Even if we would have declared func1 virtual in Derived as well and overridden it in yet another class (say DerivedDerived), the 0th element in the vtable would just be replaced with a pointer to DerivedDeriveds version of func1. So there is never more than 1 extra indirection no matter how deep the class hirearchy is.

There is also some size penalty for storing the vtable and vptrs. In this case, we would have 1 vtable for MyClass and one for Derived (8 bytes per class, as there are two function pointers in the vtable), plus a 4 byte vptr for each instance of the classes. Provided the class does something useful, this is probably a very small percentage of the total size of the classes.

To summarize, implementing and calling virtual functions in C++ is really cheap.

fredag 16 juli 2010

Blizzard perparing the ground for more eSports

I was so happy reading the recently announced system requirements for Starcraft II. As you can see, Starcraft II will run on a meager 2.6 Ghz single-core, 1 GB RAM and a 128 MB GeForce 6600 GT or equivalent (in Sweden, you cannot even buy 128 MB graphic cards anymore).

This is an important strategy from Blizzards side to make sure that Starcraft II will replace its predecessor on the eSport scence. In many of the important eSport countries like Russia, Ukraine and China, to name some, you can't expect everyone to have a top-of-the-line system, which probably also is true for their internet cafes. Also, for me, it would be awesome if Starcraft II would run on my work laptop so I could bring it to friends to play!

Lets just hope this will be enough. Starcaft eSports has recently been through a lot of trouble due to the recent match-fixing scandal and the split between Blizzard and KeSPA, the Korean eSport organization, over IP issues. Worries has also been raised over rumors that Blizzard will charge TV-companies that broadcast SC2 matches, which would stress the already strained sponsors, and ultimatley allow less money to spend on players. In worst case, this could even making professional gaming impossible.

There are bright spots however. It seems the Korean gaming community have largely accepted many of the controversial changes SC2 brought, like auto-mining, multi-building selection and multi-building placement. Also, we already see a lot of the famous players appearing in SC2 beta matches on YouTube.

Finally, all creds to the Blizzard developers, we all know how hard and time-consuming it is to cram that last piece of performance out of a system!

måndag 5 juli 2010

MS serious with Windows CE?

Many people, me included, have waited a long time for Microsoft to announce their future policy regarding Windows CE. Windows CE/Windows Phone 7 hasn't been doing to well lately on the extremely competitive smartphone market. Also, Visual Studio 2010 don't include support for writing C++ or .NET CF applications for Windows CE, and they already dropped OS design support back in Visual Studio 2008.

Windows CE 7 was announced recently, but are MS going to back it up with the tools needed? Also, it's a bit unclear what is really new in WinCE 7. I thought WinCE just finally started to look good with version 6, introducing the new memory architecture allowing unlimited number of processes and unlimited memory per process, among other things, and I hoped for more in WinCe 7.

Hopefully Microsoft will also release dedicated tools for developers, to bring back our confidence. And if they do, I wouldn't mind it including a C++ compiler with support for C++0x! If not, well, most of our code compiles under POSIX as well ;)

tisdag 18 maj 2010

Minimizing build times for C++ projects under VS2005 and later

I recently looked over our build times in our now quite large C++ project. They have slowly been creeping towards the unacceptable, but after some tweeking they are now about a third of what they were. Just wanted to list what I did here. Some things are specific for Visual Studio (we're using 2005) but most will work on any IDE.

  • Use precompiled header. I think precompiled headers have an undeserved bad reputation. If they are used correctly they can reduce your compile time tremendously, especially if you use template libraries to any extent (like STL and boost). Any big, static header file should go with the precompiled stuff (windows.h, winsock, and all stl/boost headers are obvious candidates). Check out this link on how to get it right.
  • Minimize unnessecary includes. Our project was full of unnessesary includes in header files, mostly of the nature that they could be replaced with a forward declaration. Forward declaration really is your friend! However, you should never be tempted to violate the standard rule that "Every header file should be self-sufficient". That is, a header file should not be dependent of includes made by a header file which itself includes.
  • Set the Debug Information Format to your needs. I usually use /Zi, which does not include information for Edit and Continue like the default setting. This actually makes a big difference in the size of the database files, and therefore reduces compile time. Of course, if you actually do frequently use Edit and Contiune, you should use the default.
  • Define WIN32_LEAN_AND_MEAN. This strips away seldomly used parts of windows.h, which is a monster to compile (and should be placed with the precompiled headers). It is a small chance that you will actually need anything that this defines strips away.
  • Use the /MP switch in Visual Studio. Actually, in VS2005 this is an undocumented switch that has to be added to the "Additional command line arguments" field under C++ settings. It lets the compiler use more than one core during compilation. Just activating this switch on my dual core machine reduced the total build time with about 30 %.
  • Get yourself a SSD disk. This one speaks for itself. The problem is just to persuade your IT department that you need one.