Wednesday, November 24, 2010

Fast memcpy for large blocks

Memory copy of 8MB blocks can be quite slow.

I found that both memcpy and CopyMemory won't utilize the full bandwidth of your RAM due to memory controller bottlenecks (I suspect the memory controller isn't smart enough to prefetch the right data). So this implementation by William Chan issues SSE2 prefetch instructions and gets the memory controller to literally stream the data back and forth from RAM in the fastest manner.

Note though, that you'll need to give it 16-byte aligned memory and it copies in 128-byte blocks.

The result is here (on my Core2Duo Wolfdale CPU @ 3.6GHz, dual channel DDR2 @ 800MHz):


William Chan's SSE2 memcpy:

That's nearly double the speed of the naive memcpy!

Saturday, November 13, 2010

Reporting a bug against Windows OS - possible?

Is Microsoft so arrogant that they think their OS is bug-free and that no one should ever need to report a bug against their OS?

There doesn't seem to be a way to report a bug against the Windows OS (no such category in Microsoft Connect). 2 years ago, I tried to report a bug against Vista - TreeView Indent in Vista causes HitTest to fail. The closest way would be to report it against VisualStudio. Surprise surprise, the VS product team closed it as external and did absolutely *nothing* after that. Sure, they redirected me to a non-existent website, which was supposed to be the MSDN forums, but that was where I was redirected to MS Connect in the first place! Talk about going around in circles... This really reminds me of Telstra (the ex-government owned now privatized biggest telco in Australia).

With regards to the bug that I've just reported, I'd expect them to do the same, and close it as external, not knowing which department they need to talk to. Microsoft is so big, dumb and slow that its right hand really have no idea what the left hand is doing! Sad really...

* Edit: Looks like I'm not the only one complaining:
Reporting a bug in Vista 64 WOW64
How do you file a bug report for Windows?
Problems with comdlg32.ocx, Windows Vista and long file names/extension´s

WOW64 bug: GetThreadContext() may return stale contents

My GC app runs perfectly fine under native x86 OS's starting from XP.

I've installed Windows 7 x64 recently (I've always used Win7-x86 due to 64-bit being in its infancy yet for the stuff that I do on my desktop) and to my dismay the app fails after a few minutes of running.

After hours of debugging (GC is never easy to debug, let alone a hypothesis that involves a bug in WOW64 which has been used by millions since XP-64), I found that WOW64 clobbers the ESP value (as returned by GetThreadContext(), of which the the GC thread in the app relies upon to get the current stack pointer of mutator threads) when it does a call out to long mode. I also found that ESP is always restored upon returning.

Prior to calling GetThreadContext(), the GC thread suspends all mutator threads. If it just so happens to suspend the mutator thread while it's running long mode code in user mode, the ESP value gets changed to a value indicating a higher address than the actual stack pointer (remember on IA, stack 'grows' downwards). I've seen this happen for SetEvent() and SwitchToThread() (as these are the most frequently called kernel functions in the app).

This means that either SuspendThread is suspending a thread in an incompatible way to native x86, or the thread's context in WOW64 is not being protected when the code jumps to translation mode. Either way, I was sure it's a bug.

I then found this article (difference between WOW64 and native x86 system DLLs) and while the article isn't exactly addressing the issue I'm facing, I found it very useful because this guy (Skywing from Microsoft) certainly knows WOW64 very well. I proceeded to email him and he replied with the following:
[...] there’s an issue with get and set context operations against amd64 Wow64 threads returning bad information in circumstances when the thread is running long mode code in user mode. This relates to us [Microsoft] pulling the Wow64 context from the TLS slot (as described in the [this] article) before that context structure has been updated with current contents.

That sounded very much like the issue here. So, I decided to dig deeper and put a few software traps to try and catch it in the act.

This is what I found.

The stale contents from GetThreadContext() actually came from the previous system call out (a looong way up the stack really - it's not as if it's a few instructions ago). It should've returned contents from the *current* system call out instead (or to be precise, just before the call out to long mode took place). Like Skywing said, they pulled the context before it's updated with the current contents.

With that said, we can now conclude that it is indeed an OS bug (Win7 SP1 hasn't fixed it).

Update 29 March 2014: As of Windows 8.1, this bug is still *NOT* fixed!

* I'd like to thank Skywing for his effort in assisting me to root cause this issue.