Thursday, October 21, 2010

BCC32 the Optimizing Compiler?

We all know asking the bcc32 compiler team to do put more effort into making the codegen generate more efficient / optimized asm is a *BIG ASK*, seeing how they haven't even got the basics (compiler bugs that Microsoft VC++ compiler fixed since MSVC 2005) working correctly. VC++ 6 was worse than C++ Builder 6, but since then, Microsoft have worked hard and fixed up their compiler, made it fully optimizing and generate one of the fastest and most efficient code in the world. It is also now one of the most compliant C++ compilers of all.

Let's compare the following code sample.

For the uninformed, the following usage pattern is found in a lot of expanded template code (I'm using it a lot in my GC framework and Boost uses it too).
#include <tchar.h>

struct foo
{
    inline operator bool() const
    {
        return false;
    }
};

int _tmain(int argc, _TCHAR* argv[])
{
    if (foo())
        return 0; // *see footnote
    return 0;
}

*footnote: usually this is something more meaningful, but here I'm trying to illustrate how massively unintelligent bcc32
is.



bcc32 (C++ Builder XE) command line:
    bcc32 -O2 -Hs- -C8 -v- -vi test8.cpp


generated asm:
push ebp
mov ebp,esp
add esp,-$08
push edi
lea edi,[ebp-$08]
xor eax,eax
mov ecx,$00000008
rep stosb
lea eax,[ebp-$08]
xor edx,edx
test dl,dl
jz $00401201
xor eax,eax
jmp $00401203
xor eax,eax
pop edi
pop ecx
pop ecx
pop ebp
ret



cl command line:
    cl /Ox /Ot test8.cpp


generated asm:
xor eax,eax
ret


BCC32 generated 18 lines of useless opcodes when really only 2 are required. With BCC32, your code would be a few hundred times slower (taking into account of memory access latencies).