About Larrabee and Larrabee vs GeForce / CUDA or ATI / CTM (without repeating what you could look up on other sites):
1) Michael Abrash, Tim Sweeney and John Carmack are all on board Intel's software team for Larrabee. This should give them a pretty solid team (understatement) for driver development.
2) A quote from GCDC'08: multi-thread your DirectX code and drivers. "3. Direct 3D runtimes and drivers account for 25-40 percent of CPU cycles per frame. This needs to be reduced in order to push performance!"
The freedom of offloading these 24-40 percent to Larrabee and leave the CPU to process everything else is something quite significant. This is, however, something they're still working on, as some calls involve the OS kernel and is not the natural way things happen as it stands with Larrabee on PCIe. Again, the ultimate goal is to get Larrabee sitting on your motherboard running as a co-processor, in which case scheduling will be done by the OS just as it would for a normal processor. The design decision to use software task scheduling is obviously two-folds.
3) CUDA does not support recursion (among several other things) - and will not likely be implemented in the near future due to hardware limitation - and not unless they implement a sophisticated prefetch hardware like Larrabee, it will most likely never happen.
Developers look for free-lunch. CUDA doesn't seem to provide that very well as it requires the algo to be completely rewritten - see
www.gpuchess.com for example. With that said, it doesn't mean that nVidia can't emulate some of these features through other means like what
GpuChess has done in its compiler. But, that leads me to the next point.
4) CUDA is a C-like language. That's good - but, how do you get C++ / C# / VB / Delphi / Java etc. developers to code for it? Not unless nVidia starts writing their own .NET IL runtime libraries, and VCL runtimes for their hardware (read: doesn't make sense financially and impossible in the limited time frame before Larrabee debuts). Larrabee gets all of these, for free.
The final point is what I'm excited about - because you're not restricted to just CUDA-C. You're free to develop in whatever language you're most familiar with. The best part is, with the binaries compiled for Larrabee (if you don't go for the exotic mnemonics of course), it'll be possible to run it on a machine without Larrabee, albeit much slower - but, at least it will run. I don't see any developers (bar hobbyists) getting excited over writing the same algo 3 different ways - CUDA, CTM and x86.
I don't know about the rest of you, but this looks like a very good idea to me. When I was working in Intel, I was going to propose something similar to Larrabee, but a more hardware solution. Maybe it's still possible. I'll leave that for the next post.