Thursday, July 28, 2011

StreamReader.Peek can block - Another .NET framework bug

There are so many bugs in the .NET framework that existed since the very first version and remains not fixed as of version 4.0 (see some of my other blog posts). Here's another one that I discovered a long time ago but did not have time to write about - StreamReader.Peek can block for Process.StandardOutput / Process.StandardError.

There's even a Microsoft Connect entry that has been filed since September 2005! And a several blog/forum posts [1: Jay R. Wren] [2: MSDN] [3: StackOverflow] about it. What has Microsoft done about it? Nothing! Is it surprising? Not at all - none of the bugs I've blogged about has been fixed so far, critical or not. All they did was whinge about how hard/expensive it is to fix.

So here it is Microsoft, I've come up with a solution for you (feel free to rip me off).

Essentially, what Microsoft needs to do is change Process.StartWithCreateProcess such that standardOutput and standardError are both assigned a specialised type of StreamReader (e.g. PipeStreamReader).

In this PipeStreamReader, they need to override both ReadBuffer overloads (i.e. need to change both overloads to virtual in StreamReader first) such that prior to a read, PeekNamedPipe is called to do the actual peek. As it is at the moment, FileStream.Read() (called by Peek()) will block on pipe reads when no data is available for read. While a FileStream.Read() with 0 bytes works well on files, it doesn't work all that well on pipes! In fact, the .NET team missed an important part of the pipe documentation - PeekNamedPipe WinAPI.

The PeekNamedPipe function is similar to the ReadFile function with the following exceptions:
...
The function always returns immediately in a single-threaded application, even if there is no data in the pipe. The wait mode of a named pipe handle (blocking or nonblocking) has no effect on the function.

There you go. It looks like Microsoft forgot about the PeekNamedPipe WinAPI that the other Microsoft team wrote. Wow! That's embarrassing!

Wednesday, July 6, 2011

Website Moved

To all my followers,

I've moved my website across to www.zachsaw.com from the .co.cc domain which Google has banned from their index. So please update your bookmarks.
Website: http://www.zachsaw.com
Forum: http://forum.zachsaw.com
Blog: http://blog.zachsaw.com
Thanks everyone for your continued support.

Wednesday, February 23, 2011

Win7 SP1 OS bug - Do we need Windows 8 now?

It's the same Windows 7 x64 (WOW64) bug I reported in November last year (2010) - see the entry in Microsoft Connect.

And the new Windows 7 Service Pack 1 *HASN'T* fixed it! It doesn't only affect Windows 7 though, it goes back all the way to the very first version of x64 Windows - XP 64-bit.

There's no way to workaround this problem in user code. It is a fundamental design flaw in the very first version of 64-bit Windows. This makes it very hard (expensive) for Microsoft to fix - a proper fix requires a redesign of the whole emulation layer of 32-bit apps on 64-bit Windows. Microsoft charges a minimal sum for their OSes (sarcasm), so it's not like they have the resources to work on anything that doesn't affect everyone. It's very likely that Windows 7 will be forever plagued with this bug as it would seem to be easier (cheaper) to start from a clean sheet with Windows 8. Then again, why bother? It's not as if Microsoft has any real competition at all in the x64 OS market - they own more than 80% of the pie!

From the feedback of my previous post, the problem appears more widespread than Microsoft would want to acknowledge - see Discussions about the Boehm Garbage Collector (Boehm GC). Hopefully this bug gets fixed sooner rather than later.

In the meantime, if you want to test it out yourself, you can download this (extract all and run Tester.exe). Usually running it for a few minutes will get it to crash on x64. Give it 30 minutes if it doesn't. If it doesn't crash (i.e. on 32-bit OS), you'll have to kill the app via task manager after you've tested it though (an unrelated bug in the tester application prevents it from shutting down completely - the app will appear closed but the process remains running).

Wednesday, November 24, 2010

Fast memcpy for large blocks

Memory copy of 8MB blocks can be quite slow.

I found that both memcpy and CopyMemory won't utilize the full bandwidth of your RAM due to memory controller bottlenecks (I suspect the memory controller isn't smart enough to prefetch the right data). So this implementation by William Chan issues SSE2 prefetch instructions and gets the memory controller to literally stream the data back and forth from RAM in the fastest manner.

Note though, that you'll need to give it 16-byte aligned memory and it copies in 128-byte blocks.

The result is here (on my Core2Duo Wolfdale CPU @ 3.6GHz, dual channel DDR2 @ 800MHz):

memcpy/CopyMemory:
1871.775MB/sec

William Chan's SSE2 memcpy:
3540.471MB/sec

That's nearly double the speed of the naive memcpy!

Saturday, November 13, 2010

Reporting a bug against Windows OS - possible?

Is Microsoft so arrogant that they think their OS is bug-free and that no one should ever need to report a bug against their OS?

There doesn't seem to be a way to report a bug against the Windows OS (no such category in Microsoft Connect). 2 years ago, I tried to report a bug against Vista - TreeView Indent in Vista causes HitTest to fail. The closest way would be to report it against VisualStudio. Surprise surprise, the VS product team closed it as external and did absolutely *nothing* after that. Sure, they redirected me to a non-existent website, which was supposed to be the MSDN forums, but that was where I was redirected to MS Connect in the first place! Talk about going around in circles... This really reminds me of Telstra (the ex-government owned now privatized biggest telco in Australia).

With regards to the bug that I've just reported, I'd expect them to do the same, and close it as external, not knowing which department they need to talk to. Microsoft is so big, dumb and slow that its right hand really have no idea what the left hand is doing! Sad really...


* Edit: Looks like I'm not the only one complaining:
Reporting a bug in Vista 64 WOW64
How do you file a bug report for Windows?
Problems with comdlg32.ocx, Windows Vista and long file names/extension´s

WOW64 bug: GetThreadContext() may return stale contents

My GC app runs perfectly fine under native x86 OS's starting from XP.

I've installed Windows 7 x64 recently (I've always used Win7-x86 due to 64-bit being in its infancy yet for the stuff that I do on my desktop) and to my dismay the app fails after a few minutes of running.

After hours of debugging (GC is never easy to debug, let alone a hypothesis that involves a bug in WOW64 which has been used by millions since XP-64), I found that WOW64 clobbers the ESP value (as returned by GetThreadContext(), of which the the GC thread in the app relies upon to get the current stack pointer of mutator threads) when it does a call out to long mode. I also found that ESP is always restored upon returning.

Prior to calling GetThreadContext(), the GC thread suspends all mutator threads. If it just so happens to suspend the mutator thread while it's running long mode code in user mode, the ESP value gets changed to a value indicating a higher address than the actual stack pointer (remember on IA, stack 'grows' downwards). I've seen this happen for SetEvent() and SwitchToThread() (as these are the most frequently called kernel functions in the app).

This means that either SuspendThread is suspending a thread in an incompatible way to native x86, or the thread's context in WOW64 is not being protected when the code jumps to translation mode. Either way, I was sure it's a bug.

I then found this article (difference between WOW64 and native x86 system DLLs) and while the article isn't exactly addressing the issue I'm facing, I found it very useful because this guy (Skywing from Microsoft) certainly knows WOW64 very well. I proceeded to email him and he replied with the following:
[...] there’s an issue with get and set context operations against amd64 Wow64 threads returning bad information in circumstances when the thread is running long mode code in user mode. This relates to us [Microsoft] pulling the Wow64 context from the TLS slot (as described in the [this] article) before that context structure has been updated with current contents.

That sounded very much like the issue here. So, I decided to dig deeper and put a few software traps to try and catch it in the act.

This is what I found.

The stale contents from GetThreadContext() actually came from the previous system call out (a looong way up the stack really - it's not as if it's a few instructions ago). It should've returned contents from the *current* system call out instead (or to be precise, just before the call out to long mode took place). Like Skywing said, they pulled the context before it's updated with the current contents.

With that said, we can now conclude that it is indeed an OS bug (Win7 SP1 hasn't fixed it).

* I'd like to thank Skywing for his effort in assisting me to root cause this issue.

Thursday, October 21, 2010

BCC32 the Optimizing Compiler?

We all know asking the bcc32 compiler team to do put more effort into making the codegen generate more efficient / optimized asm is a *BIG ASK*, seeing how they haven't even got the basics (compiler bugs that Microsoft VC++ compiler fixed since MSVC 2005) working correctly. VC++ 6 was worse than C++ Builder 6, but since then, Microsoft have worked hard and fixed up their compiler, made it fully optimizing and generate one of the fastest and most efficient code in the world. It is also now one of the most compliant C++ compilers of all.

Let's compare the following code sample.

For the uninformed, the following usage pattern is found in a lot of expanded template code (I'm using it a lot in my GC framework and Boost uses it too).
#include <tchar.h>

struct foo
{
    inline operator bool() const
    {
        return false;
    }
};

int _tmain(int argc, _TCHAR* argv[])
{
    if (foo())
        return 0; // *see footnote
    return 0;
}

*footnote: usually this is something more meaningful, but here I'm trying to illustrate how massively unintelligent bcc32
is.



bcc32 (C++ Builder XE) command line:
    bcc32 -O2 -Hs- -C8 -v- -vi test8.cpp


generated asm:
push ebp
mov ebp,esp
add esp,-$08
push edi
lea edi,[ebp-$08]
xor eax,eax
mov ecx,$00000008
rep stosb
lea eax,[ebp-$08]
xor edx,edx
test dl,dl
jz $00401201
xor eax,eax
jmp $00401203
xor eax,eax
pop edi
pop ecx
pop ecx
pop ebp
ret



cl command line:
    cl /Ox /Ot test8.cpp


generated asm:
xor eax,eax
ret


BCC32 generated 18 lines of useless opcodes when really only 2 are required. With BCC32, your code would be a few hundred times slower (taking into account of memory access latencies).