Friday, December 4, 2009

A Precise Garbage Collector for C++ Builder

Ahh... Nothing rivals the feeling of success!

I've finally devised a precise garbage collector (note: a precise GC is very different from conventional GCs such as BoehmGC - a precise GC does not suffer from ambiguities) for C++ Builder that is faster than boost's shared_ptr.

That's a bold claim indeed! And, by claiming that it is faster than shared_ptr, I'm also claiming that I can create a String class that is similarly faster than Delphi string, which is also reference counted like shared_ptr. And yes, the gc_ptr is thread-safe too.

The trick is in the pointer assignment. In my precise GC, a pointer assign is no more expensive than its native equivalent - both are a single instruction operation, whereas a ref counted pointer requires a locked instruction and a refcount check, both of which are *very* expensive on any processors, especially modern multicore ones. Most precise GC source codes you can freely download from the net get the assignment operator correct. However, the part where they lose out big time is in the gc_ptr's constructor and destructor with lock instructions and what not for multithread support (note that some don't even support multithread). These are often non-trivial and is the biggest cause of performance bottleneck.

With a minimal c'tor and d'tor for my gc_ptr, I end up with an insanely fast precise GC engine which runs a .NET like framework in pure native mode. How about that - a managed framework that runs natively. None of the pain (both literally and performance-wise) of pinning objects and marshaling data going in and out of the managed world (anything that interfaces with the real world cannot have a non-fixed address - e.g. it's impossible to write a driver that allocates a .NET managed non-fixed buffer and let your system's DMA fill it up). Looks like .NET's GC team is paying the price of betting on the Stop-And-Copy algo (yes, the .NET GC is nothing more than an advanced version of the Stop-And-Copy GC).

Tuesday, November 10, 2009

LANBench - Network Benchmark

Folks,

I've just released the first version of LANBench here - http://www.zachsaw.co.cc/?pg=lanbench_tcp_network_benchmark



Created it very quickly recently to find the maximum bandwidth of my gigabit switch / NICs in my LAN. LANBench mainly tests your network bandwith, but also to a certain extent gives you an idea of how much CPU is being spent in just transferring data back and forth between your machines.

LANBench takes the HDD subsystem out of the equation and lets you test the raw speed of your network.

Head over and check it out for more info.

Friday, October 16, 2009

DynamicArray causing memory leak in C++ Builder

Recently I came across a memory leak which CodeGuard reported was due to DynamicArray.set_length(...).

Well, DynamicArrays are referenced counted, exactly the same way as AnsiStrings / UnicodeStrings so really there's no way you could've missed free-ing it... ... or is there?

From experience, the first thing that came to mind was that since it's a referenced-counted container - and as with the rest, it's prone to circular-references.

Consider the following code:
#include <vcl.h>
#pragma hdrstop

#include <iostream.h>


struct
TCircularRef
{

DynamicArray<TCircularRef> refs;

~TCircularRef() { cout << "here" << endl; }
};


#pragma argsused
int main(int argc, char* argv[])
{

DynamicArray<TCircularRef> base;
base.set_length(1);

base[0].refs = base;
base.set_length(0);

return 0;
}

With the 'refs' member of DynamicArray<TCircularRef> base referring back to itself (base[0].refs = base), we ended up with a circular-ref. Yes, the cirular-ref in the above example is easy to spot and would probably never occur in the real world as well. In reality though, the circular-refs are much harder to spot and assignments change during run-time, depending on the code branch and/or timing. What's worse is that the reference could be held by an object of which you pass to another library written by someone else...

Notice that the above sample code tries to break the circular-ref such that it would get cleaned-up with the line base.set_length(0)? Alas, that was to no avail as its ref-count prior to set_length would be 2, which leaves us with a ref-count of 1 after that. Can we call set_length(0) again then? No. Why? Because 'base' no longer knows about the object it once held as the first set_length(0) would've set it to point to the new memory location allocated by set_length(0).

Is there a way out? There're a few. But none to my liking. (You could Google for 'circular-reference' if you're interested to find a solution to this but it's not the intention of this article).
As I have a garbage collector library ready to be used in any projects I create with C++ Builder, that was the solution I took. I simply replaced all DynamicArray with gc_array and allocated it with gcnew_array. With that, I no longer had to care about the hairy problem of who owns the object, who's responsible to free it and whether there's a potential circular reference problem in my code. What's more, it's thread-safe, unlike DynamicArray. But that's a topic for another day.

In short, this is my advice: Avoid DynamicArray at all costs!

Wednesday, October 14, 2009

Stepping into a Delphi package while debugging in C++ Builder

Well, you really should debug a Delphi package in Delphi. But once every so often, you'll find yourself using the Delphi package (controls / components) in C++ Builder and in need of tracing into the package while you're debugging your main application. By default, even if you compile the Delphi package in Debug mode, you won't be able to step into the Delphi source codes. These are the settings which will enable that (this guide is for CB2007 but it shouldn't be too different for other versions of C++ Builder):

Go to the project options of the Delphi package.

Select the Compiler view on the left pane.


Set the build config to Debug mode.

Code generation:
  • Optimization = off
  • Stack frames = on
Debugging:
  • Debug information = on
  • Local symbols = on
  • Assertions = on
  • Use debug DCUs = on
Now select the Linker view on the left pane.



Map file:
  • Off
EXE and DLL options:
  • Include TD32 debug info = on
  • Include remote debug symbols = on
Linker output:
  • Generate all C++Builder files (you should already have this selected)

You're done with the settings.

Rebuild your package and your main app and you should now be able to step into the Delphi source.

Tuesday, August 18, 2009

Watching HD Video in MPC-HC DXVA for ATI HD 4000 Series Card Owners

It's widely known that ATI card owners do not have the luxury of Nvidia card owners when it comes to playing HD video in MPC-HC in DXVA mode. With the built-in H.264/AVC video decoder, HD video such as 1920x800 with reference frames of 6 and above (i.e. L5.0 / L5.1 AVC streams) won't play (actually, it usually plays but with corrupted blocks and/or video freezing after a few seconds).

What I've gathered from the comments of the readers of my original post about this issue and from months of testing using a plethora of mixing and matching on my own HTPC, I thought it'll save everyone some time for me to post a summary of what settings worked with Media Player Classic - Home Cinema. I previously said it's necessary to use PowerDVD 9 for DXVA with ATI HD 4000 series (such as my ATI HD 4350) but that is no longer true and that a better combination has been found thanks to those who contributed in my previous blog post. This is only for ATI card owners with UVD 2 (or newer) though. HD3000 series owners, you guys are out of luck unfortunately (you have AMD / ATI to thank).

Here are some settings I've tested and their results:

Common settings:
Pentium 4 HT 3.0GHz (underclocked to 2.0GHz, DDR2@222MHz (444MHz effective), HT disabled)
ATI HD 4350 (Asus) - 512MB RAM @ 400MHz DDR2 (800MHz effective)
Windows XP SP3 (latest Windows Update as of 18 August 2009)
ATI Catalyst 9.7
Media Player Classic - Home Cinema Build 1237 x86 (download here)
Haali Media Splitter externally installed (i.e. not using MPC-HC's one)
DirectVobSub


Here's what I've found.

So long as you have the DXVA decoder output pin directly connected to the renderer, you'll get proper working DXVA decode with up to 16 ref-frames for full HD streams (1920x1080) regardless of the renderer (well, almost):

VMR7 (Windowed) = OK
VMR9 (Windowed) = failed (pink vertical stripes with Motion Vectors)
VMR7 Renderless = OK
VMR9 Renderless = OK

For the sake of discussion, let's use the VMR7 (Windowed) renderer. It has been proven to work with both Cyberlink's PDVD8 AVC decoder and ArcSoft's Video Decoder. With MPC's DXVA decoder, video with ref-frames of 12 and above will freeze after a few seconds of playback. Cyberlink's decoder suffers from occassional judder while ArcSoft's Video Decoder plays up to 16
ref-frames in DXVA mode perfectly. The results vary when other renderers are used. For example, the combo of VMR9 Renderless + MPC's DXVA decoder yielded the same pink vertical stripes.

I still have not found a combination that would get subtitles working without causing issues with DXVA decode. Under VMR9 Renderless + ArcSoft + MPC-HC internal subtitles renderer, the video would playback in like half the speed! Frames are decoded and rendered correctly though. If anyone has any idea how to solve that, I'd be all ears!

Update: With ATI Catalyst 9.8, the combo of "VMR9 Renderless + ArcSoft + MPC-HC internal subtitles renderer" now works properly. Although with DirectVobSub, the half-frame-rate playback issue is still there.

Update 2: It appears that, in order to get the results I've published here, you'll need to 'initialize' the ATI driver for proper operation of DirectX by opening the Catalyst Control Center each time you start / restart your PC (not sure about stand-by / hibernate). I was taken by surprise yesterday for not doing so - I got a blank white screen instead of the normal proper video playback. Once I opened the CCC (and closed it), reopening MPC-HC and the video then worked fine. *Bravo ATI!* You've got my standing ovation.

Anyway, this is the settings that I've found to work under MPC-HC without subtitles:

MPC-HC settings:
Renderer - VMR7 (Windowed)
Auto load subtitles - false (see Update above)
DirectVobSub options - disable autoload subtitles
Use ArcSoft Video Decoder as your AVC/H.264 decoder
Use AC3Filter directshow decoder

Again, make sure your DXVA decoder's output pin is connected directly to the renderer's input pin. This means under your DirectVobSub filter options, make sure it never loads subtitles and that it is blocked under your MPC-HC external filters page. The following is what I've configured under that page:

Arcsoft video decoder - preferred.
AC3Filter - preferred.
DirectVobSub - blocked.

Wednesday, June 10, 2009

NTFS A Journaling File System

If NTFS is a journaling file system (if you don't know what it is, I suggest you read it up first before continuing) then why do we have to perform such an extensive chkdsk (checkdisk or scandisk) every time the system does not shutdown cleanly (due to a power or system failure)?

A journaling file system should always have their file system in a consistent state. For example, the MFT (Master File Table) should never indicate a cluster as being occupied while it in fact isn't. But so often when we don't cleanly shut Windows (2k or XP for that matter) down, the checkdisk that runs when Windows boots up again will find the exact inconsistency as described above.

Q: So is NTFS a journaling file system?
A: Yes.

Q: Then why is NTFS inconsistent?
A: NTFS is ONLY inconsistent (or unsafe) when an unclean shutdown occurs if you are running Windows 2K or XP. Microsoft for some reason (most likely performance related) chose not to enable the journaling function for non-server version of Windows. Windows Vista, however, enables it by default (I'm assuming Windows 7 does as well).

Q: Should I enable NTFS journals then?
A: It's up to you really, but personally I have a higher preference for the safety of my data, and I can't tell the speed difference between having it enabled / disabled on my hard drives, so it's a definite Yes for me.

Q: How do I enable NTFS journals?
A: Go to command prompt and run the following command for each NTFS partition:
fsutil usn createjournal m=1000 a=100 C:

Q: How do I check if my NTFS partition has journals enabled?
A: Run the following command:
fsutil usn queryjournal C:

Q: Does that mean I don't have to run chkdsk any more?
A: Not really. Just that you don't have to do it every time you fail to shutdown your computer properly. You should still do it occasionally (like defrag).

Q: How do I disable chkdsk on start-up?
A: http://www.raymond.cc/blog/archives/2008/02/23/disable-or-stop-auto-chkdsk-during-windows-startup/

Catalyst 9.6 ATI 4000 HD Series Still Behind NVIDIA For HTPC

ATI is still behind NVIDIA for HTPC even with the latest leaked Catalyst 9.6 - It still fails to decode L5.0 / L5.1 high profile AVC video in DXVA mode (using MPC-HC for example). This means that for HTPC, a NVidia 8600GT / 9400GT would be the better choice over ATI HD 4800 / 4700 / 4600 / 4500 / 4300 series.

If you're looking to build a HTPC, go for NVIDIA.

While ATI heavily promotes its HTPC capabilities, the truth is it is still very far behind NVIDIA.

NVIDIAusers have been enjoying this for about 6 months.

@NVIDIA marketing, you could consider starting a "The way it's meant to be watched" program. Doesn't look like ATI has anything left in them to pose a threat whatsoever.