There's something wrong with FastMM4's (i.e. the default memory manager of Delphi / C++ Builder starting BDS2006) usability on multicore systems, especially running multithreaded apps in a GC/managed environment. The result of this is that when multicore is enabled,
performance suffers by up to 5 folds. So, not only that
FastMM would not scale, your multithreaded apps will run tremendously slower on a multicore system - up to
5 times slower on a dual-core machine vs a single-core one at the same clock speed of the same architecture.
That's 500% performance drop going from single-core to dual-core! Comparing the dual-core performance of FastMM4 and TBBMM, the latter is
9 times faster!
This test is meant to show just that.
Download Test (updated 27/01/2010) (see readme.txt for instructions)
*** WARNING: Incompatible with x64 OS due to
an OS bug.
It runs through a variety of algorithms in multiple threads (in a threadpool of the framework, similar to .NET's ThreadPool) consisting of a mix of GC list, GC dictionary, and GC string unit-tests.
Keep in mind that this is an app written using
a GC framework, which means allocations usually happen in multiple threads concurrently while de-allocations are done in specialized garbage collector threads. This may be the reason FastMM breaks down (a general-purpose memory manager shouldn't break down given any usage patterns).
Notice that when you run the FastMM Test with
CPU Affinity set to just one CPU, you'll end up with nearly the same performance as TBBMM. Once you enable multicores though, you'd immediately lose performance once again, running slower than with just one core.
Note: You'll find that the FastMM BorlndMM.dll is different from the default Rad Studio 2010 one. This is due to the changes added to support the GC framework, but at its heart, it's simply making calls to
GetMemory,
ReallocMemory and
FreeMemory (as oppose to WinMM's version of
HeapAlloc,
HeapRealloc and
HeapFree respectively, with all else
being equal). The WinMM version is initialized with the LFH
(low fragmentation heap) flag.
Here are some results from my own tests:
Test results in ops/second (10sec average), listed in the following order:
1) TBBMM (what is TBBMM?)
2) WinMM
3) FastMM
Core2Duo E6550 2.33GHz (Conroe) - XP SP3
Both cores enabled
1) 1785
2) 1230
3) 250
Single core (via CPU affinity mask)
1) 930
2) 650
3) 950
Core2Duo E6550 throttled to 1.33GHz - XP SP3
Both cores enabled
1) 730
2) 520
3) 180
Single core (via CPU affinity mask)
1) 410
2) 275
3) 395
Pentium M 1.2GHz (Banias) - XP SP3
CPU is Single core
1) 395
2) 340
3) 395
Core2Duo E7200 3.6GHz (Wolfdale) - Vista
Both cores enabled
1) 2595
2) 2080
3) 290
Single core (via CPU affinity mask)
1) 1450
2) 1180
3) 1405
As you can see, the results are quite consistent. On a dual core machine, the performance of FastMM is terrible. From 2.33GHz to 3.6GHz, there's virtually no increase at all in speed! In fact, when the test was running, the CPU wasn't even fully utilized (with more than 50% of CPU spent in kernel time), whereas the other memory managers had the CPU pegged at 100% and nearly no kernel time.
If you wish to try it out on your system,
download this GC speed tester (updated 27/01/2010) and unzip it to a folder of your choice. Then, run
"Run All Tests.bat" and follow the on-screen instructions. Note that the GC Speed Test app will run indefinitely, so once you take note of the speed (ops/sec), you can quit the app to move on to the next test.
I'd appreciate it if you could post your results here in the comments in the same format as the ones above - i.e. CPU make (I'd love to see how AMD CPUs fare) and model number as well as the frequency, OS / service pack, and the results.
My advice? For an all-rounded memory manager, use the Windows default one. It may be a little slower than FastMM on a single core, but it certainly scales very well on multicore systems. Alternatively, the Intel TBB allocator has a near perfect scaling and is the fastest memory managers around. Only thing is, it consumes more RAM.
Regardless,
I'd stay away from FastMM4 (thus the default memory manager of Delphi / C++ Builder).