Tuesday, November 10, 2009

LANBench - Network Benchmark

Folks,

I've just released the first version of LANBench here - http://www.zachsaw.com/?pg=lanbench_tcp_network_benchmark



Created it very quickly recently to find the maximum bandwidth of my gigabit switch / NICs in my LAN. LANBench mainly tests your network bandwith, but also to a certain extent gives you an idea of how much CPU is being spent in just transferring data back and forth between your machines.

LANBench takes the HDD subsystem out of the equation and lets you test the raw speed of your network.

Head over and check it out for more info.

Friday, October 16, 2009

DynamicArray causing memory leak in C++ Builder

Recently I came across a memory leak which CodeGuard reported was due to DynamicArray.set_length(...).

Well, DynamicArrays are referenced counted, exactly the same way as AnsiStrings / UnicodeStrings so really there's no way you could've missed free-ing it... ... or is there?

From experience, the first thing that came to mind was that since it's a referenced-counted container - and as with the rest, it's prone to circular-references.

Consider the following code:
#include <vcl.h>
#pragma hdrstop

#include <iostream.h>


struct
TCircularRef
{

DynamicArray<TCircularRef> refs;

~TCircularRef() { cout << "here" << endl; }
};


#pragma argsused
int main(int argc, char* argv[])
{

DynamicArray<TCircularRef> base;
base.set_length(1);

base[0].refs = base;
base.set_length(0);

return 0;
}

With the 'refs' member of DynamicArray<TCircularRef> base referring back to itself (base[0].refs = base), we ended up with a circular-ref. Yes, the cirular-ref in the above example is easy to spot and would probably never occur in the real world as well. In reality though, the circular-refs are much harder to spot and assignments change during run-time, depending on the code branch and/or timing. What's worse is that the reference could be held by an object of which you pass to another library written by someone else...

Notice that the above sample code tries to break the circular-ref such that it would get cleaned-up with the line base.set_length(0)? Alas, that was to no avail as its ref-count prior to set_length would be 2, which leaves us with a ref-count of 1 after that. Can we call set_length(0) again then? No. Why? Because 'base' no longer knows about the object it once held as the first set_length(0) would've set it to point to the new memory location allocated by set_length(0).

Is there a way out? There're a few. But none to my liking. (You could Google for 'circular-reference' if you're interested to find a solution to this but it's not the intention of this article).
As I have a garbage collector library ready to be used in any projects I create with C++ Builder, that was the solution I took. I simply replaced all DynamicArray with gc_array and allocated it with gcnew_array. With that, I no longer had to care about the hairy problem of who owns the object, who's responsible to free it and whether there's a potential circular reference problem in my code. What's more, it's thread-safe, unlike DynamicArray. But that's a topic for another day.

In short, this is my advice: Avoid DynamicArray at all costs!

Wednesday, October 14, 2009

Stepping into a Delphi package while debugging in C++ Builder

Well, you really should debug a Delphi package in Delphi. But once every so often, you'll find yourself using the Delphi package (controls / components) in C++ Builder and in need of tracing into the package while you're debugging your main application. By default, even if you compile the Delphi package in Debug mode, you won't be able to step into the Delphi source codes. These are the settings which will enable that (this guide is for CB2007 but it shouldn't be too different for other versions of C++ Builder):

Go to the project options of the Delphi package.

Select the Compiler view on the left pane.


Set the build config to Debug mode.

Code generation:
  • Optimization = off
  • Stack frames = on
Debugging:
  • Debug information = on
  • Local symbols = on
  • Assertions = on
  • Use debug DCUs = on
Now select the Linker view on the left pane.



Map file:
  • Off
EXE and DLL options:
  • Include TD32 debug info = on
  • Include remote debug symbols = on
Linker output:
  • Generate all C++Builder files (you should already have this selected)

You're done with the settings.

Rebuild your package and your main app and you should now be able to step into the Delphi source.

Tuesday, August 18, 2009

Watching HD Video in MPC-HC DXVA for ATI HD 4000 Series Card Owners

It's widely known that ATI card owners do not have the luxury of Nvidia card owners when it comes to playing HD video in MPC-HC in DXVA mode. With the built-in H.264/AVC video decoder, HD video such as 1920x800 with reference frames of 6 and above (i.e. L5.0 / L5.1 AVC streams) won't play (actually, it usually plays but with corrupted blocks and/or video freezing after a few seconds).

What I've gathered from the comments of the readers of my original post about this issue and from months of testing using a plethora of mixing and matching on my own HTPC, I thought it'll save everyone some time for me to post a summary of what settings worked with Media Player Classic - Home Cinema. I previously said it's necessary to use PowerDVD 9 for DXVA with ATI HD 4000 series (such as my ATI HD 4350) but that is no longer true and that a better combination has been found thanks to those who contributed in my previous blog post. This is only for ATI card owners with UVD 2 (or newer) though. HD3000 series owners, you guys are out of luck unfortunately (you have AMD / ATI to thank).

Here are some settings I've tested and their results:

Common settings:
Pentium 4 HT 3.0GHz (underclocked to 2.0GHz, DDR2@222MHz (444MHz effective), HT disabled)
ATI HD 4350 (Asus) - 512MB RAM @ 400MHz DDR2 (800MHz effective)
Windows XP SP3 (latest Windows Update as of 18 August 2009)
ATI Catalyst 9.7
Media Player Classic - Home Cinema Build 1237 x86 (download here)
Haali Media Splitter externally installed (i.e. not using MPC-HC's one)
DirectVobSub


Here's what I've found.

So long as you have the DXVA decoder output pin directly connected to the renderer, you'll get proper working DXVA decode with up to 16 ref-frames for full HD streams (1920x1080) regardless of the renderer (well, almost):

VMR7 (Windowed) = OK
VMR9 (Windowed) = failed (pink vertical stripes with Motion Vectors)
VMR7 Renderless = OK
VMR9 Renderless = OK

For the sake of discussion, let's use the VMR7 (Windowed) renderer. It has been proven to work with both Cyberlink's PDVD8 AVC decoder and ArcSoft's Video Decoder. With MPC's DXVA decoder, video with ref-frames of 12 and above will freeze after a few seconds of playback. Cyberlink's decoder suffers from occassional judder while ArcSoft's Video Decoder plays up to 16
ref-frames in DXVA mode perfectly. The results vary when other renderers are used. For example, the combo of VMR9 Renderless + MPC's DXVA decoder yielded the same pink vertical stripes.

I still have not found a combination that would get subtitles working without causing issues with DXVA decode. Under VMR9 Renderless + ArcSoft + MPC-HC internal subtitles renderer, the video would playback in like half the speed! Frames are decoded and rendered correctly though. If anyone has any idea how to solve that, I'd be all ears!

Update: With ATI Catalyst 9.8, the combo of "VMR9 Renderless + ArcSoft + MPC-HC internal subtitles renderer" now works properly. Although with DirectVobSub, the half-frame-rate playback issue is still there.

Update 2: It appears that, in order to get the results I've published here, you'll need to 'initialize' the ATI driver for proper operation of DirectX by opening the Catalyst Control Center each time you start / restart your PC (not sure about stand-by / hibernate). I was taken by surprise yesterday for not doing so - I got a blank white screen instead of the normal proper video playback. Once I opened the CCC (and closed it), reopening MPC-HC and the video then worked fine. *Bravo ATI!* You've got my standing ovation.

Anyway, this is the settings that I've found to work under MPC-HC without subtitles:

MPC-HC settings:
Renderer - VMR7 (Windowed)
Auto load subtitles - false (see Update above)
DirectVobSub options - disable autoload subtitles
Use ArcSoft Video Decoder as your AVC/H.264 decoder
Use AC3Filter directshow decoder

Again, make sure your DXVA decoder's output pin is connected directly to the renderer's input pin. This means under your DirectVobSub filter options, make sure it never loads subtitles and that it is blocked under your MPC-HC external filters page. The following is what I've configured under that page:

Arcsoft video decoder - preferred.
AC3Filter - preferred.
DirectVobSub - blocked.

Wednesday, June 10, 2009

NTFS A Journaling File System

If NTFS is a journaling file system (if you don't know what it is, I suggest you read it up first before continuing) then why do we have to perform such an extensive chkdsk (checkdisk or scandisk) every time the system does not shutdown cleanly (due to a power or system failure)?

A journaling file system should always have their file system in a consistent state. For example, the MFT (Master File Table) should never indicate a cluster as being occupied while it in fact isn't. But so often when we don't cleanly shut Windows (2k or XP for that matter) down, the checkdisk that runs when Windows boots up again will find the exact inconsistency as described above.

Q: So is NTFS a journaling file system?
A: Yes.

Q: Then why is NTFS inconsistent?
A: NTFS is ONLY inconsistent (or unsafe) when an unclean shutdown occurs if you are running Windows 2K or XP. Microsoft for some reason (most likely performance related) chose not to enable the journaling function for non-server version of Windows. Windows Vista, however, enables it by default (I'm assuming Windows 7 does as well).

Q: Should I enable NTFS journals then?
A: It's up to you really, but personally I have a higher preference for the safety of my data, and I can't tell the speed difference between having it enabled / disabled on my hard drives, so it's a definite Yes for me.

Q: How do I enable NTFS journals?
A: Go to command prompt and run the following command for each NTFS partition:
fsutil usn createjournal m=1000 a=100 C:

Q: How do I check if my NTFS partition has journals enabled?
A: Run the following command:
fsutil usn queryjournal C:

Q: Does that mean I don't have to run chkdsk any more?
A: Not really. Just that you don't have to do it every time you fail to shutdown your computer properly. You should still do it occasionally (like defrag).

Q: How do I disable chkdsk on start-up?
A: http://www.raymond.cc/blog/archives/2008/02/23/disable-or-stop-auto-chkdsk-during-windows-startup/

Catalyst 9.6 ATI 4000 HD Series Still Behind NVIDIA For HTPC

ATI is still behind NVIDIA for HTPC even with the latest leaked Catalyst 9.6 - It still fails to decode L5.0 / L5.1 high profile AVC video in DXVA mode (using MPC-HC for example). This means that for HTPC, a NVidia 8600GT / 9400GT would be the better choice over ATI HD 4800 / 4700 / 4600 / 4500 / 4300 series.

If you're looking to build a HTPC, go for NVIDIA.

While ATI heavily promotes its HTPC capabilities, the truth is it is still very far behind NVIDIA.

NVIDIAusers have been enjoying this for about 6 months.

@NVIDIA marketing, you could consider starting a "The way it's meant to be watched" program. Doesn't look like ATI has anything left in them to pose a threat whatsoever.

Tuesday, June 2, 2009

ATI HD Hardware Accelerated DXVA for H.264 AVC L5.0 / L5.1

HOW TO: Get hardware accelerated DXVA playback of HD AVC High Profile L5.0 / L5.1 MKV / MP4 Files on ATI HD Series.

Well, technically speaking, not DXVA, but hardware accelerated playback of L5.0 / L5.1 files nevertheless. (update: it *is* DXVA - not sure just yet why some decoders work and some don't - possibly because some decoders send more compliant bitstream?)

I've recently built a HTPC from an old Pentium 4 HT. I know it has not enough grunt to decode AVC High Profile video, so I bought an ATI 4350 HD in the hope that it'll do all video decoding on its GPU (or UVD).

To my disappointment, I found this (ATI does not support AVC with High Profile above L4.1) after I bought the card. All my encodes are done with L5.1 as my other PC is a Core 2 Duo which has an NVidia 9600GT. For some reason, the NVidia driver is able to support DXVA for AVC High Profile L5.1, so I simply assumed ATI would be the same. Turns out that the maximum the ATI would do is L4.1 (There's a Quantum of Solace trailer encoded at L5 to test here - http://nunnally.ahmygoddess.net/watching-h264-videos-using-dxva/).

Regardless of which decoder I use, my P4 HT simply isn't powerful enough to playback these files (CPU hits 100% all the time and frames drop very frequently).

Luckily for us ATI owners, there is a solution. With PowerDVD9 and Catalyst 9.5, I finally found a combination that would get 1920x1080 HD videos with ref-frames > 4 to play without taxing my CPU. In fact, I was pleasantly surprised when I reran the Quantum of Solace test video -- my CPU remained at 2% utilization!

I recommend the following setup for ATI users:
  • PowerDVD 9 build 1530 ==> Must be this build! Other builds will not work
  • ATI Catalyst 9.5 (non-hotfix version)
  • AC3Filter
  • Haali Media Splitter (to playback MKV files) ==> version 1.9.42.1 (or later)

*** New:

If you prefer to use Media Player Classic - Home Cinema (MPC-HC), see this.

*** Note1:

Rename .MKV to .MP4 to get PowerDVD to playback MKV files.

*** Note2:

Important
- No other filters (e.g. FFDShow / CoreAVC / Codec Packs) should be installed in your system!


Try it out yourself (you could download the trial version of PowerDVD 9).

MAKE SURE you DO NOT have any other filters installed (for example, CoreAVC or FFDShow which may have a higher merit than PowerDVD's own filter) or PowerDVD will not use its internal H264 decoder. Also, when opening MKV files, PowerDVD will complain that xvidcore.dll could not be found, but will continue playing the video just fine. If you want to suppress the error message, simply download xvidcore.dll and put it in the same folder as the PowerDVD executable (e.g. c:\Program Files\CyberLink\PowerDVD\).

Leave a comment to let me know if it does / doesn't work for you with your card's configuration and OS, for example:
  • HD 4350 PCIe x16 512MB
  • WinXP SP3

*** Update 1:


This clip (the 'Bird Scene' from Planet Earth) is the ultimate L5.1 super high bitrate MKV sample. On my nVidia 9600GT setup with a E7200 CPU @ 3.6GHz, it uses 50% of the CPU (playback using MPC - Home Cinema with driver supporting L5.1 bitstream DXVA). On my Pentium 4 HT (single core CPU) and the HD 4350 setup, I get 25% CPU usage. That is simply mind-boggling! What can we conclude? ATI is A LOT better at decoding H264 streams?

*** Update 2:

I've found that PowerDVD has a problem with H264 encoded files that have been tagged with the wrong IDC. For example, if the file actually contains a high profile L5.1 bitstream but its IDC tag is marked incorrectly (e.g. L4.1), you will get stuttering problems. If that happens, you'll have to change the file's IDC tag back to L5.1 using IDC Multi Changer.

*** Update 3:

While testing my configuration with a ref-frame 12 encode at 1920x800, I found that certain scenes (usually panning slowly) would judder (i.e. a couple of frames get dropped) and they always happen at the exact same time code. I tried remuxing the .mkv file to .ts / .m2ts but to no avail. I also increased the input buffer size to 100000KB from 8192KB in the Haali Media Splitter settings, which also did not help. Having spent a few hours on it, I finally decided to look at PowerDVD's settings itself. Apparently, under Advanced Video Preferences (Right-click Main Screen, click on Configuration, select the Video tab, click Advanced...), there's a group box called Video Quality. I had Normal Mode selected from before when I was using PowerDVD without AVIVO. Setting it to Best Mode solves the problem. GPU and CPU usages remain unchanged at 6-8% and ~12% (DTS is being decoded in software) respectively.

I can only make an educated guess on the reason behind the judder. Video Quality relates to the post processing / de-interlacing / pulldown settings. The judder which I picked up on slow panning scenes are probably due to the lack of pulldown under Normal Mode. When set to Best Mode, pulldown (what is this?) is activated to match the 24fps source to my 1080i LCD panel (1920x1080 at 30Hz).

*** Update 4:

A full update on this topic has been posted here.