SSE2 HowTo?

G

Guest

Guest
Hi,
Can anybody point me to some details on SSE/SSE2/3DNOW programming?

From the most recent article on the PIV, it looks like SSE2 can make a tremendous difference in floating point number crunching. So how do you use it? I've found overview articles, but no meat. The e-mail from the Intel guy said that "porting to SSE2 was easy".

I realize the object of these extensions is multimedia optimization, but I'm interested in looking at them for scientific programming where processing of large FP arrays is common. Comments?

A few basic questions:
Do you have to write in assembly, or is it rolled into the compilers?

If the latter, which compilers? Any for Linux?

Thanks!
Eric
 
G

Guest

Guest
Ok, answering my own post here. I found some resources.

http://www.arstechnica.com/cpu/2q00/klat2/klat2-1.html

This is a good story on the KLAT2 Beowulf including a very cool GA designed network architecture. It also discusses some tremendous performance boosts using a 3DNow optimized BLAS. Lots of links including this one

http://aggregate.org/SWAR/

Which has a header file abstracting some of the assembly in to C calls.
Eric