Friday, December 4, 2009

A Precise Garbage Collector for C++ Builder

Ahh... Nothing rivals the feeling of success!

I've finally devised a precise garbage collector (note: a precise GC is very different from conventional GCs such as BoehmGC - a precise GC does not suffer from ambiguities) for C++ Builder that is faster than boost's shared_ptr.

That's a bold claim indeed! And, by claiming that it is faster than shared_ptr, I'm also claiming that I can create a String class that is similarly faster than Delphi string, which is also reference counted like shared_ptr. And yes, the gc_ptr is thread-safe too.

The trick is in the pointer assignment. In my precise GC, a pointer assign is no more expensive than its native equivalent - both are a single instruction operation, whereas a ref counted pointer requires a locked instruction and a refcount check, both of which are *very* expensive on any processors, especially modern multicore ones. Most precise GC source codes you can freely download from the net get the assignment operator correct. However, the part where they lose out big time is in the gc_ptr's constructor and destructor with lock instructions and what not for multithread support (note that some don't even support multithread). These are often non-trivial and is the biggest cause of performance bottleneck.

With a minimal c'tor and d'tor for my gc_ptr, I end up with an insanely fast precise GC engine which runs a .NET like framework in pure native mode. How about that - a managed framework that runs natively. None of the pain (both literally and performance-wise) of pinning objects and marshaling data going in and out of the managed world (anything that interfaces with the real world cannot have a non-fixed address - e.g. it's impossible to write a driver that allocates a .NET managed non-fixed buffer and let your system's DMA fill it up). Looks like .NET's GC team is paying the price of betting on the Stop-And-Copy algo (yes, the .NET GC is nothing more than an advanced version of the Stop-And-Copy GC) - Update: *Correction, I was wrong. The CLR GC is very similar to what I have implemented - in fact, even more so now that I have finished the generational version of the GC. Only notable thing is that my GC does not compact the heap after the sweep phase and does not support weak references or resurrection. Also, it relies on an external memory allocator.

Update 1: GcString is 16.5 times faster than AnsiString in an array reversal test in CB2010 - more info here.

Update 2: FastMM (Rad Studio's default memory manager) is excruciatingly slow in a GC / managed app in multicore systems - more info here.

Tuesday, November 10, 2009

LANBench - Network Benchmark

Folks,

I've just released the first version of LANBench here - http://www.zachsaw.com/?pg=lanbench_tcp_network_benchmark



Created it very quickly recently to find the maximum bandwidth of my gigabit switch / NICs in my LAN. LANBench mainly tests your network bandwith, but also to a certain extent gives you an idea of how much CPU is being spent in just transferring data back and forth between your machines.

LANBench takes the HDD subsystem out of the equation and lets you test the raw speed of your network.

Head over and check it out for more info.

Friday, October 16, 2009

DynamicArray causing memory leak in C++ Builder

Recently I came across a memory leak which CodeGuard reported was due to DynamicArray.set_length(...).

Well, DynamicArrays are referenced counted, exactly the same way as AnsiStrings / UnicodeStrings so really there's no way you could've missed free-ing it... ... or is there?

From experience, the first thing that came to mind was that since it's a referenced-counted container - and as with the rest, it's prone to circular-references.

Consider the following code:
#include <vcl.h>
#pragma hdrstop

#include <iostream.h>


struct
TCircularRef
{

DynamicArray<TCircularRef> refs;

~TCircularRef() { cout << "here" << endl; }
};


#pragma argsused
int main(int argc, char* argv[])
{

DynamicArray<TCircularRef> base;
base.set_length(1);

base[0].refs = base;
base.set_length(0);

return 0;
}

With the 'refs' member of DynamicArray<TCircularRef> base referring back to itself (base[0].refs = base), we ended up with a circular-ref. Yes, the cirular-ref in the above example is easy to spot and would probably never occur in the real world as well. In reality though, the circular-refs are much harder to spot and assignments change during run-time, depending on the code branch and/or timing. What's worse is that the reference could be held by an object of which you pass to another library written by someone else...

Notice that the above sample code tries to break the circular-ref such that it would get cleaned-up with the line base.set_length(0)? Alas, that was to no avail as its ref-count prior to set_length would be 2, which leaves us with a ref-count of 1 after that. Can we call set_length(0) again then? No. Why? Because 'base' no longer knows about the object it once held as the first set_length(0) would've set it to point to the new memory location allocated by set_length(0).

Is there a way out? There're a few. But none to my liking. (You could Google for 'circular-reference' if you're interested to find a solution to this but it's not the intention of this article).
As I have a garbage collector library ready to be used in any projects I create with C++ Builder, that was the solution I took. I simply replaced all DynamicArray with gc_array and allocated it with gcnew_array. With that, I no longer had to care about the hairy problem of who owns the object, who's responsible to free it and whether there's a potential circular reference problem in my code. What's more, it's thread-safe, unlike DynamicArray. But that's a topic for another day.

In short, this is my advice: Avoid DynamicArray at all costs!

Wednesday, October 14, 2009

Stepping into a Delphi package while debugging in C++ Builder

Well, you really should debug a Delphi package in Delphi. But once every so often, you'll find yourself using the Delphi package (controls / components) in C++ Builder and in need of tracing into the package while you're debugging your main application. By default, even if you compile the Delphi package in Debug mode, you won't be able to step into the Delphi source codes. These are the settings which will enable that (this guide is for CB2007 but it shouldn't be too different for other versions of C++ Builder):

Go to the project options of the Delphi package.

Select the Compiler view on the left pane.


Set the build config to Debug mode.

Code generation:
  • Optimization = off
  • Stack frames = on
Debugging:
  • Debug information = on
  • Local symbols = on
  • Assertions = on
  • Use debug DCUs = on
Now select the Linker view on the left pane.



Map file:
  • Off
EXE and DLL options:
  • Include TD32 debug info = on
  • Include remote debug symbols = on
Linker output:
  • Generate all C++Builder files (you should already have this selected)

You're done with the settings.

Rebuild your package and your main app and you should now be able to step into the Delphi source.

Tuesday, August 18, 2009

Watching HD Video in MPC-HC DXVA for ATI HD 4000 Series Card Owners

It's widely known that ATI card owners do not have the luxury of Nvidia card owners when it comes to playing HD video in MPC-HC in DXVA mode. With the built-in H.264/AVC video decoder, HD video such as 1920x800 with reference frames of 6 and above (i.e. L5.0 / L5.1 AVC streams) won't play (actually, it usually plays but with corrupted blocks and/or video freezing after a few seconds).

What I've gathered from the comments of the readers of my original post about this issue and from months of testing using a plethora of mixing and matching on my own HTPC, I thought it'll save everyone some time for me to post a summary of what settings worked with Media Player Classic - Home Cinema. I previously said it's necessary to use PowerDVD 9 for DXVA with ATI HD 4000 series (such as my ATI HD 4350) but that is no longer true and that a better combination has been found thanks to those who contributed in my previous blog post. This is only for ATI card owners with UVD 2 (or newer) though. HD3000 series owners, you guys are out of luck unfortunately (you have AMD / ATI to thank).

Here are some settings I've tested and their results:

Common settings:
Pentium 4 HT 3.0GHz (underclocked to 2.0GHz, DDR2@222MHz (444MHz effective), HT disabled)
ATI HD 4350 (Asus) - 512MB RAM @ 400MHz DDR2 (800MHz effective)
Windows XP SP3 (latest Windows Update as of 18 August 2009)
ATI Catalyst 9.7
Media Player Classic - Home Cinema Build 1237 x86 (download here)
Haali Media Splitter externally installed (i.e. not using MPC-HC's one)
DirectVobSub


Here's what I've found.

So long as you have the DXVA decoder output pin directly connected to the renderer, you'll get proper working DXVA decode with up to 16 ref-frames for full HD streams (1920x1080) regardless of the renderer (well, almost):

VMR7 (Windowed) = OK
VMR9 (Windowed) = failed (pink vertical stripes with Motion Vectors)
VMR7 Renderless = OK
VMR9 Renderless = OK

For the sake of discussion, let's use the VMR7 (Windowed) renderer. It has been proven to work with both Cyberlink's PDVD8 AVC decoder and ArcSoft's Video Decoder. With MPC's DXVA decoder, video with ref-frames of 12 and above will freeze after a few seconds of playback. Cyberlink's decoder suffers from occassional judder while ArcSoft's Video Decoder plays up to 16
ref-frames in DXVA mode perfectly. The results vary when other renderers are used. For example, the combo of VMR9 Renderless + MPC's DXVA decoder yielded the same pink vertical stripes.

I still have not found a combination that would get subtitles working without causing issues with DXVA decode. Under VMR9 Renderless + ArcSoft + MPC-HC internal subtitles renderer, the video would playback in like half the speed! Frames are decoded and rendered correctly though. If anyone has any idea how to solve that, I'd be all ears!

Update: With ATI Catalyst 9.8, the combo of "VMR9 Renderless + ArcSoft + MPC-HC internal subtitles renderer" now works properly. Although with DirectVobSub, the half-frame-rate playback issue is still there.

Update 2: It appears that, in order to get the results I've published here, you'll need to 'initialize' the ATI driver for proper operation of DirectX by opening the Catalyst Control Center each time you start / restart your PC (not sure about stand-by / hibernate). I was taken by surprise yesterday for not doing so - I got a blank white screen instead of the normal proper video playback. Once I opened the CCC (and closed it), reopening MPC-HC and the video then worked fine. *Bravo ATI!* You've got my standing ovation.

Anyway, this is the settings that I've found to work under MPC-HC without subtitles:

MPC-HC settings:
Renderer - VMR7 (Windowed)
Auto load subtitles - false (see Update above)
DirectVobSub options - disable autoload subtitles
Use ArcSoft Video Decoder as your AVC/H.264 decoder
Use AC3Filter directshow decoder

Again, make sure your DXVA decoder's output pin is connected directly to the renderer's input pin. This means under your DirectVobSub filter options, make sure it never loads subtitles and that it is blocked under your MPC-HC external filters page. The following is what I've configured under that page:

Arcsoft video decoder - preferred.
AC3Filter - preferred.
DirectVobSub - blocked.

Wednesday, June 10, 2009

NTFS A Journaling File System

If NTFS is a journaling file system (if you don't know what it is, I suggest you read it up first before continuing) then why do we have to perform such an extensive chkdsk (checkdisk or scandisk) every time the system does not shutdown cleanly (due to a power or system failure)?

A journaling file system should always have their file system in a consistent state. For example, the MFT (Master File Table) should never indicate a cluster as being occupied while it in fact isn't. But so often when we don't cleanly shut Windows (2k or XP for that matter) down, the checkdisk that runs when Windows boots up again will find the exact inconsistency as described above.

Q: So is NTFS a journaling file system?
A: Yes.

Q: Then why is NTFS inconsistent?
A: NTFS is ONLY inconsistent (or unsafe) when an unclean shutdown occurs if you are running Windows 2K or XP. Microsoft for some reason (most likely performance related) chose not to enable the journaling function for non-server version of Windows. Windows Vista, however, enables it by default (I'm assuming Windows 7 does as well).

Q: Should I enable NTFS journals then?
A: It's up to you really, but personally I have a higher preference for the safety of my data, and I can't tell the speed difference between having it enabled / disabled on my hard drives, so it's a definite Yes for me.

Q: How do I enable NTFS journals?
A: Go to command prompt and run the following command for each NTFS partition:
fsutil usn createjournal m=1000 a=100 C:

Q: How do I check if my NTFS partition has journals enabled?
A: Run the following command:
fsutil usn queryjournal C:

Q: Does that mean I don't have to run chkdsk any more?
A: Not really. Just that you don't have to do it every time you fail to shutdown your computer properly. You should still do it occasionally (like defrag).

Q: How do I disable chkdsk on start-up?
A: http://www.raymond.cc/blog/archives/2008/02/23/disable-or-stop-auto-chkdsk-during-windows-startup/

Catalyst 9.6 ATI 4000 HD Series Still Behind NVIDIA For HTPC

ATI is still behind NVIDIA for HTPC even with the latest leaked Catalyst 9.6 - It still fails to decode L5.0 / L5.1 high profile AVC video in DXVA mode (using MPC-HC for example). This means that for HTPC, a NVidia 8600GT / 9400GT would be the better choice over ATI HD 4800 / 4700 / 4600 / 4500 / 4300 series.

If you're looking to build a HTPC, go for NVIDIA.

While ATI heavily promotes its HTPC capabilities, the truth is it is still very far behind NVIDIA.

NVIDIAusers have been enjoying this for about 6 months.

@NVIDIA marketing, you could consider starting a "The way it's meant to be watched" program. Doesn't look like ATI has anything left in them to pose a threat whatsoever.

Tuesday, June 2, 2009

ATI HD Hardware Accelerated DXVA for H.264 AVC L5.0 / L5.1

HOW TO: Get hardware accelerated DXVA playback of HD AVC High Profile L5.0 / L5.1 MKV / MP4 Files on ATI HD Series.

Well, technically speaking, not DXVA, but hardware accelerated playback of L5.0 / L5.1 files nevertheless. (update: it *is* DXVA - not sure just yet why some decoders work and some don't - possibly because some decoders send more compliant bitstream?)

I've recently built a HTPC from an old Pentium 4 HT. I know it has not enough grunt to decode AVC High Profile video, so I bought an ATI 4350 HD in the hope that it'll do all video decoding on its GPU (or UVD).

To my disappointment, I found this (ATI does not support AVC with High Profile above L4.1) after I bought the card. All my encodes are done with L5.1 as my other PC is a Core 2 Duo which has an NVidia 9600GT. For some reason, the NVidia driver is able to support DXVA for AVC High Profile L5.1, so I simply assumed ATI would be the same. Turns out that the maximum the ATI would do is L4.1 (There's a Quantum of Solace trailer encoded at L5 to test here - http://nunnally.ahmygoddess.net/watching-h264-videos-using-dxva/).

Regardless of which decoder I use, my P4 HT simply isn't powerful enough to playback these files (CPU hits 100% all the time and frames drop very frequently).

Luckily for us ATI owners, there is a solution. With PowerDVD9 and Catalyst 9.5, I finally found a combination that would get 1920x1080 HD videos with ref-frames > 4 to play without taxing my CPU. In fact, I was pleasantly surprised when I reran the Quantum of Solace test video -- my CPU remained at 2% utilization!

I recommend the following setup for ATI users:
  • PowerDVD 9 build 1530 ==> Must be this build! Other builds will not work
  • ATI Catalyst 9.5 (non-hotfix version)
  • AC3Filter
  • Haali Media Splitter (to playback MKV files) ==> version 1.9.42.1 (or later)

*** New:

If you prefer to use Media Player Classic - Home Cinema (MPC-HC), see this.

*** Note1:

Rename .MKV to .MP4 to get PowerDVD to playback MKV files.

*** Note2:

Important
- No other filters (e.g. FFDShow / CoreAVC / Codec Packs) should be installed in your system!


Try it out yourself (you could download the trial version of PowerDVD 9).

MAKE SURE you DO NOT have any other filters installed (for example, CoreAVC or FFDShow which may have a higher merit than PowerDVD's own filter) or PowerDVD will not use its internal H264 decoder. Also, when opening MKV files, PowerDVD will complain that xvidcore.dll could not be found, but will continue playing the video just fine. If you want to suppress the error message, simply download xvidcore.dll and put it in the same folder as the PowerDVD executable (e.g. c:\Program Files\CyberLink\PowerDVD\).

Leave a comment to let me know if it does / doesn't work for you with your card's configuration and OS, for example:
  • HD 4350 PCIe x16 512MB
  • WinXP SP3

*** Update 1:


This clip (the 'Bird Scene' from Planet Earth) is the ultimate L5.1 super high bitrate MKV sample. On my nVidia 9600GT setup with a E7200 CPU @ 3.6GHz, it uses 50% of the CPU (playback using MPC - Home Cinema with driver supporting L5.1 bitstream DXVA). On my Pentium 4 HT (single core CPU) and the HD 4350 setup, I get 25% CPU usage. That is simply mind-boggling! What can we conclude? ATI is A LOT better at decoding H264 streams?

*** Update 2:

I've found that PowerDVD has a problem with H264 encoded files that have been tagged with the wrong IDC. For example, if the file actually contains a high profile L5.1 bitstream but its IDC tag is marked incorrectly (e.g. L4.1), you will get stuttering problems. If that happens, you'll have to change the file's IDC tag back to L5.1 using IDC Multi Changer.

*** Update 3:

While testing my configuration with a ref-frame 12 encode at 1920x800, I found that certain scenes (usually panning slowly) would judder (i.e. a couple of frames get dropped) and they always happen at the exact same time code. I tried remuxing the .mkv file to .ts / .m2ts but to no avail. I also increased the input buffer size to 100000KB from 8192KB in the Haali Media Splitter settings, which also did not help. Having spent a few hours on it, I finally decided to look at PowerDVD's settings itself. Apparently, under Advanced Video Preferences (Right-click Main Screen, click on Configuration, select the Video tab, click Advanced...), there's a group box called Video Quality. I had Normal Mode selected from before when I was using PowerDVD without AVIVO. Setting it to Best Mode solves the problem. GPU and CPU usages remain unchanged at 6-8% and ~12% (DTS is being decoded in software) respectively.

I can only make an educated guess on the reason behind the judder. Video Quality relates to the post processing / de-interlacing / pulldown settings. The judder which I picked up on slow panning scenes are probably due to the lack of pulldown under Normal Mode. When set to Best Mode, pulldown (what is this?) is activated to match the 24fps source to my 1080i LCD panel (1920x1080 at 30Hz).

*** Update 4:

A full update on this topic has been posted here.

Wednesday, March 25, 2009

Cracking Signed ActiveX Controls

Ever come across an ActiveX control you can't afford to buy but wish to use for your own little pet project? Especially those readily available for download, but annoys the hell out of you when you try to use it as it keeps painting "Evaluation Version" all over itself?

Well, from time to time, I develop pet projects to keep myself entertained. Being a perfectionist, I just can't stand to see the "Eval Version" message appearing on my applications. Yes, the sole purpose of these apps are just so I could marvel at myself when I finish them, but how would I be able to do that if I leave the "Eval Version" message lying around? I want that newly polished, super shiny feeling you get when you finish something you can be proud of.

So I set out with a mission - to crack the ActiveX control that is sprinkling all those "Eval Version" on itself.

Surprisingly, cracking the several ActiveX controls I've used isn't all that hard - so long as you know a little x86 ASM and have the right tools (such as the W32Dasm for Windows). Most ActiveX controls are smart enough to prevent simple cracks such as looking up the "Eval Version" message and skipping them via a jmp, but in order to draw the text on the screen, the message would still have to be decrypted somehow and placed in the memory. And, THAT is usually the weakest link. What I usually do is, I'd change the opcodes directly in the memory while I step through the ASM in an IDE (I use CodeGear RAD Studio 2007) so I could see the effects immediately. If things go wrong, I'd simply break out of the debug session and restart.

Once you have cracked it, use a hex editor to change the bytes you've changed in the IDE - you'll have to note down a few instructions before and after the ones you're about to change when you were debugging in the IDE, as there may be more than one occurrence of the bytes you search for. For example, there could be numerous occurrences of "33C9" hex, which translates to "xor ecx,ecx".

For normal exe / dll files, patching the file and then doing a final test would've been the end of it. With a signed ActiveX file, however, you'd find that if you try to use the ActiveX in your IDE, it'll complain that the ActiveX cannot be loaded. That simply means that the hash (usually SHA1) in the ActiveX no longer matches the file - and rightly so! This is easily circumvented though - all you need to do is re-sign the file, and then re-import that into your IDE.

Lucky for us, there's no need to un-sign the ActiveX file before re-signing it with our own cert. Here's a link with step-by-step guide for signing an ActiveX file - http://www.pantaray.com/signcode.html

In the next installment, I'd probably blog about cracking signed .NET assemblies, if responses are good for this one.

Wednesday, March 18, 2009

WinAPI functions and GetLastError / WSAGetLastError

Having been using WinAPI for more than a decade, I was surprised with myself for not being more careful with the return value and GetLastError's error code following a WinAPI call. Granted the bug I was facing was not easy to root cause, but that was no excuse as I should've been more careful when coding. We all make mistakes but the difference between a good and bad programmer is that the former is constantly alert of the pitfalls when coding (read: edging toward paranoia) while the latter simply hacks away hoping things will work out fine.

The thing to keep in mind is WinAPI functions don't set Last Error Code unless the function's return value is FALSE (zero) even if GetLastError sometimes returns ERROR_SUCCESS (e.g. TlsGetValue) - it does NOT mean that WinAPI functions will set Last Error Code to ERROR_SUCCESS regardless. If you've made the assumption that all WinAPI function calls will reset Last Error Code to ERROR_SUCCESS (or NO_ERROR) following a call, then chances are your code will work fine so long as the function has not encountered an error, but will fail as soon as it does, because subsequent calls to the function will appear like an error if you rely on just GetLastError for the result. You should ALWAYS read the documentation of each function as their Return Value varies and remember that there will be one scenario where Last Error Code won't be set.

Sunday, February 22, 2009

Avoid AMD at all costs (and Gigabyte too)

This goes for everything - GPU / graphics cards, CPU, software, and even shares.

I recently got my first AMD graphics card and found it to be extremely bad in quality - both the hardware itself as well as the drivers.

My first card was from MSY (VIC Australia) - it was a Gigabyte 4350 (AMD / ATI card). Plug it in to my old machine which was sidelined recently due to an upgrade, but had been running ECS nVidia 9600, I found that the GDDR2 chips are defective. In fact, booting into Vista caused the graphics card to lock-up entirely. After some investigations, I discovered that there are a few frequencies that the card chooses depending on the state of the card - it boots up with a 200MHz GDDR2 RAM frequency (400MHz effective) and runs 3D (load) with a 500MHz (max default) frequency.

Keep in mind that this is the default frequencies (read: NON-OC'ed) from Gigabyte. Just to get a closure and confirm that it is in fact defective RAM as I had suspected, I flashed the video card BIOS with one that runs the RAM at a lower speed and found that it worked. The maximum speed I could get it to run at was 300MHz - a far cry from the advertised 500MHz (1000MHz effective)! Heck, 300MHz? My OLD machine is running 800MHz DDR2 (1600MHz effective). And to put that into perspective, my nVidia 9600GT works perfectly with an overclocked GDDR2 at 1200MHz (2400MHz effective)! Gigabyte and AMD - shame on you!

I'd also avoid Gigabyte at all costs. So far I've owned 5 Gigabyte motherboards - none of which I could go to the shop, purchase it, and plonk in a CPU and install OS. Gigabyte = troublesome. I'd have to return it and get it exchanged for a new one. It's not the 2nd board that'll work but the 3rd! Most won't even get past POST. Apparently, it's no different with a graphics card.

Back to the graphics card issue - this time the shop ran out of the graphics card - so I got to the 2nd, and that did not work either. Eventually we settled for a refund. I immediately drop by Centrecom and got myself an Asus 4350. Dropping it into my system and it immediately worked - 3DMark06 ran flawlessly (discounting the performance of course).

Then, I decided to watch some blu-ray contents and to my surprise, it stuttered like mad. Why? Because it's using my old CPU to decode in software - no UVD! Keep in mind this is with Media Player Classic - HC, which used to accelerate blu-ray with my nVidia 9600GT (which was running a BETA driver, as opposed to the AMD 4350 which was running the official Catalyst 9.1).

After spending days reinstalling Windows and drivers, I still had no luck. Eventually, I stumbled upon AMD's own forum while looking for solutions and found that people have been complaining about the same thing - Catalyst 9.1 broke all DXVA support in Windows XP! I immediately did a driver feedback via their website and told them the issue.

A month later, Catalyst 9.2 was released. Still broken!

I bought this card for the sole purpose of accelerating HD contents - if the card can't do it, then stop advertising it! At least in Australia, the law protects consumers in this regard - we can return it if we find that it does not do what it's advertised to. I'm sure in US consumers have no rights whatsoever and get lied to with false advertisements.

My advice, stick with Intel and nVidia. Heck, even Intel's G45 integrated gfx would've been a LOT better - yes there're some issues with 24Hz playback (1920x1080p 24Hz that is, and not all TVs support that) but for all other modes, it accelerates HD contents flawlessly, just as advertised.

But, with that said, I'd like people (read fanboys) to continue buying AMD just to keep Intel and nVidia in check (quality, performance and price). :-)

Tuesday, February 17, 2009

Performance Monitor / Counters not working in XP

Strangely enough, the XP machine I use in the office has never had the performance counters working.

If you tried running SysInternal's pslist (I use it mostly so I could do pskill for the issue I blogged about here), you'll get a message saying, "Processor performance object not found on Run Exctrlst from the Windows Resource Kit to repair the performance counters."

Or, if you try running the Performance MMC snap-in (In XP it's under Control Panel --> Administrative Tools --> Performance), you'll get no response for a minute or two, and when the window appears, the red line in the graph will not move (and you won't be able to add any graphs for monitoring either).

Or, if you try to use performance counters in .NET (e.g. for CPU Usage monitoring), you'll get an exception complaining performance counters are not available.


What's the problem? To be honest, I don't know and I don't care enough to find out, but I do care enough to make it work.


Here's how:
  • Install Windows Support Tools which could be found on Microsoft's website (here's the XP SP2 version which will also run in SP3 - http://www.microsoft.com/downloads/details.aspx?FamilyID=49ae8576-9bb9-4126-9761-ba8011fabf38&displaylang=en)
  • Once you've installed it, run exctrlst.exe
  • Make sure that PerfProc and PerfOS have their Performance Counters enabled (select the entry and verify that the "Performance Counters Enabled" option is checked)
  • Once you've done that, close the window and run PsList again to see if it works
  • If not, continue reading. You'll need a copy of Windows installation CD for the following steps

PsList is still not working:

What we'll be doing is manually replace some of the files with the original ones (I'm guessing at some point, Microsoft released a hotfix which broke the files).
  • Go to the I386 folder on your Windows CD
  • Make a backup of C:\Windows\System32\perfc009.dat and C:\Windows\System32\perfh009.dat (we'll be overwriting them)
  • Expand perfc009.da_ to C:\Windows\System32\perfc009.dat (expand perfc009.da_ C:\Windows\System32\perfc009.dat)
  • Expand perfh009.da_ to C:\Windows\System32\perfh009.dat (expand perfh009.da_ C:\Windows\System32\perfh009.dat)

Try running PsList again - it should work now.


If all of the above failed, there's one other resource you could read up - How to manually rebuild Performance Counter Library values.

Windows hangs while debugging - How to regain Windows GUI responsiveness

Following up on the issue I blogged about a while ago about Windows locking up while debugging multithreaded applications under CodeGear RAD Studio, someone from the CodeGear IDE forum suggested that Command Prompt would still work responsively. Some guessed that it could be due to Windows running out of GDI handles (how that is related to the debugger is beyond me, so if someone cares to explain, I'm all ears) and this proves to be quite correct.

Apparently, this problem has been confirmed on Microsoft Visual C++ 6 up to Visual Studio 2003 (have yet to heard anyone commenting about 2005 or 2008, so they could also be affected) and Borland C++ Builder 6 up to CodeGear C++ Builder 2009. So far I've gathered that this happens on both Windows Server 2003 and XP, regardless of the level of service pack installed.

In order to get your machine responsive again, you'd have to kill the debugger (which is BDS.exe / BCB.exe for CodeGear / Borland and MSDEV.exe / DevEnv.exe for Microsoft). The painful way is to bring up Windows Task Manager (Ctrl+Shift+ESC to save you a few minutes of right-clicking on the taskbar and waiting for the GUI to appear) and select the process to terminate. Going through that process will take you somewhere around 5 to 10 minutes. There's a simpler way though (courtesy of Ingo Zettl - thanks!), which requires less than 5 seconds if you have everything setup before starting your debug session:
  • Install SysInternals PsTools which could be found here http://technet.microsoft.com/en-us/sysinternals/bb896649.aspx
  • Start command prompt, go to the PsTools folder and leave the command window open
  • Start debugging
  • Once you hit the OS+debugger bug, Alt+TAB straight to the command window (beware: if you inadvertently Alt+TAB to the wrong window, you'll waste a few minutes waiting for Windows to respond)
  • Run PsList - you should get a list of processes running on your machine
  • Lookup the Pid (Process ID) of your debugger (e.g. BDS.exe)
  • Run PsKill e.g. - pskill -t [pid]
At this point, your machine should regain responsiveness.

We all have Microsoft to thank for this 'little' inconvenience which makes life 'fun' for us Win32 software developers.

Thursday, February 5, 2009

Debugging multithreaded applications in CodeGear hangs Windows XP Kernel


When debugging a multithreaded application in CodeGear under Windows XP, users have found that the debugger locks up the Windows kernel frequently (twice a day for me). When the Windows kernel locks up (actually slows to a crawl), if you hit CTRL+SHIFT+ESC to bring up the task manager, it will not come up until after a few minutes. While waiting, hit CTRL+SHIFT+ESC say 50 times. You'll get 50 task managers appearing after the first one pops up.

At this point, Windows XP kernel is still painfully slow - Task Manager shows no CPU usage, kernel's just 99.9% locked up. With that 0.1% responsiveness, you'll be able to slowly terminate bds.exe and again, wait ages for the confirmation window to pop up. Once bds.exe is terminated, your system will be fully responsive again. You'll be left with the above screen shot.

Good job CodeGear, for your ability to bring XP kernel to its knees...


* Update: Just upgraded to XP SP3 from SP2, and problem persists.

Friday, January 23, 2009

Turning C++ into C#

Can you really turn C++ into C#?

The answer is yes to a certain extent, but you'll need certain C++ keyword extensions which are readily available in C++ compilers such as BCC (Borland / CodeGear C++ Compiler), namely indexed properties (VC++ does support properties via the __declspec(property) extension but does not support indexed ones).

What I've found really interesting is that it is quite possible to write templates and turn C++ objects into fully 'managed' - e.g. garbage collected, arrays are range checked and pointers are checked for null reference.

I've even went as far as to create templates for keywords such as C#'s 'using', 'foreach' and 'lock'. They behave exactly the same as their counterparts in C#, albeit with a little syntax difference for the 'using' and 'foreach' keyword.

The advantage of having a C# like C++ library will be greatly appreciated by programmers who love C# but would like their target application to run on non-dotnet platforms such as an embedded XP with limited RAM for cost reasons. Apart from that, applications based on this library will always outperform any C# applications since it's basically C++ with a garbage collector behind it. There are no managed / unmanaged switching penalty when pinvoking and the garbage collector is definitely more deterministic. I've made it such that it such that the code could temporarily disable the garbage collector. This makes real-time systems much more reliable than those using .NET.

With this library, it is a lot simpler to write high performance applications - glue logic could be written in C#Lib, and the core in pure C/C++ or ASM. With clear separation between managed and non-managed object within the same source code, the non managed part runs per usual, while the object itself could still be managed by C#Lib. The garbage collector also does not relocate objects - they rely on a low-fragmentation memory manager for memory allocations. The GC is a double edged sword in that it is faster and is also easier on the library user as calling into legacy code / APIs / code written in other languages for high performance as the objects always have a permanent address.

There are a few subtle things which C++ can't do but C# (or in a more generic term, .NET) can. For example, the multiple inheritance feature in C++ is rather limited. You can't explicitly override a method for a specific base class (even if it's implemented as an interface).

But the most important thing which C++ lacks is the garbage collector, which proved extremely useful in design patterns for the asynchronous (read multi/many core) future (actually, now). C#Lib bridges the gap and has proven to be extremely useful for me.