Jun 22, 2009

when to SSE ?

Long long ago, when I playing with Jaina, I don`t know why my Jaina can`t act as smooth as Blizzard`s. But there is a rumor about how to make Jaina move more like a really young girl. SSE is the first solution.

I don`t find any document about cost of SSE (ohh...I`m lzay, you know that...). But in my experience, there is a simple rule. The most general usage of SSE is matrix multiplication. And there are many many multiplication in bone skin animation. But you`ll find it cost more CPU power if you only write SSE to do "one" vector multiple "one" matrix.

The simple rule is :
if the number of vector multiplication is more than double of vector IO, SSE will gain higher performance.

For example, the dot value of 2 vectors need 3 IO (2 vectors in, one value out), but there is only 1 vector multiplication. Another example is vector multiple matrix, 6 IO (5 in, 1 out) with 4 vector mul ...... so we still can`t get better performance.

But in the bone animation case, there are less bones with many many point. That means many points will mul the same matrix in each frame. If you have to mul N points, you need :

  1. 4 vector reading from matrix.
  2. N vector reading from points.
  3. N vector writing to points.
  4. each point need 4 vector mul.

Ignore the 1`st one, it just meet my simple rule. So there is a chance make my Jaina act more smooth. (Just write a function to mul many vectors to one matrix).

BTW, there are many 0 in normal 3d matrix......that`s another story.

No comments: