I've installed Windows 7 x64 recently (I've always used Win7-x86 due to 64-bit being in its infancy yet for the stuff that I do on my desktop) and to my dismay the app fails after a few minutes of running.
After hours of debugging (GC is never easy to debug, let alone a hypothesis that involves a bug in WOW64 which has been used by millions since XP-64), I found that WOW64 clobbers the ESP value (as returned by GetThreadContext(), of which the the GC thread in the app relies upon to get the current stack pointer of mutator threads) when it does a call out to long mode. I also found that ESP is always restored upon returning.
Prior to calling GetThreadContext(), the GC thread suspends all mutator threads. If it just so happens to suspend the mutator thread while it's running long mode code in user mode, the ESP value gets changed to a value indicating a higher address than the actual stack pointer (remember on IA, stack 'grows' downwards). I've seen this happen for SetEvent() and SwitchToThread() (as these are the most frequently called kernel functions in the app).
This means that either SuspendThread is suspending a thread in an incompatible way to native x86, or the thread's context in WOW64 is not being protected when the code jumps to translation mode. Either way, I was sure it's a bug.
I then found this article (difference between WOW64 and native x86 system DLLs) and while the article isn't exactly addressing the issue I'm facing, I found it very useful because this guy (Skywing from Microsoft) certainly knows WOW64 very well. I proceeded to email him and he replied with the following:
[...] there’s an issue with get and set context operations against amd64 Wow64 threads returning bad information in circumstances when the thread is running long mode code in user mode. This relates to us [Microsoft] pulling the Wow64 context from the TLS slot (as described in the [this] article) before that context structure has been updated with current contents.
That sounded very much like the issue here. So, I decided to dig deeper and put a few software traps to try and catch it in the act.
This is what I found.
The stale contents from GetThreadContext() actually came from the previous system call out (a looong way up the stack really - it's not as if it's a few instructions ago). It should've returned contents from the *current* system call out instead (or to be precise, just before the call out to long mode took place). Like Skywing said, they pulled the context before it's updated with the current contents.
With that said, we can now conclude that it is indeed an OS bug (Win7 SP1 hasn't fixed it).
* I'd like to thank Skywing for his effort in assisting me to root cause this issue.