Waiting & Waiting: 2009

Dec 24, 2009

灌籃高手

看過的漫畫不多，不過灌籃高手是其中一部。這兩天回顧了一番，還是一樣令人熱血沸騰！然而回憶最多的還是排球系隊，很慶幸大學時代參加了排球隊，而且有幸與眾多優秀的隊友一起奮戰，即使已經無法想像當時能抓框的彈力，卻仍有許多瞬間深深地印在腦海裡。雖然實力平平，脾氣又差，但是因為有一群好隊友，所以我摸過獎盃 XD

Dec 14, 2009

我希望……

我希望上班日的非上班時間能：

花兩個小時以上看書或寫程式
花一個小時吃早餐加聽英文
花兩個小時玩遊戲或從事放空腦袋的活動
花兩個小時搭公車上下班
花一個小時吃晚餐加民生必需事務

感覺時間好少啊……這樣至少要八小時，但是工作八小時加中午休息一小時，還剩七小時可以睡！

p.s. 最近在修正在家寫程式的方式，時間這麼少就不能隨便寫了！----one man army !

Dec 2, 2009

No distortion !

Fixed ! Direction of reflection is incorrect !

Dec 1, 2009

Nov 30, 2009

Better, but ...

Better, but there are some bugs.

1> Don`t know the root cause of distortion (back surface).

2> Some refraction are disappeared.

3> Both bottom-right & bottom-left corners are rendered with incorrect lighting.

Nov 25, 2009

Tired~

I feel tired today, and still don`t have better idea how to implement some details. So adding several lines to get this (hard code) only.

Nov 24, 2009

BusyBusyBusy.......

I am very busy recently (due to new job). But coding is still excited ! Above is my new toy !

Nov 6, 2009

CUDA Note[5]="float";


//-----------------------------------------------------------------------------
__global__ void __FloatTest(unsigned int* pIn)
{
    unsigned int fMask = *pIn;

    float iMask =
        (float)((fMask & 0xff000000) >> 24) * 0.1122f +
        (float)((fMask & 0x00ff0000) >> 16) * 0.2233f +
        (float)((fMask & 0x0000ff00) >>  8) * 0.3344f +
        (float)((fMask & 0x000000ff) >>  0) * 0.4455f;

    *((float*)pIn) = iMask;
}

//-----------------------------------------------------------------------------
void FloatTest()
{
    const unsigned int fMask = 0x22446688;

    float  iMask = 0.0f;
    float* pMask = 0;

    ::cudaMalloc(&pMask, sizeof(float));

    ::cudaMemcpy(pMask, &fMask, sizeof(unsigned int), cudaMemcpyHostToDevice);

    __FloatTest<<<1, 1>>>((unsigned int*)pMask);

    ::cudaMemcpy(&iMask, pMask, sizeof(float), cudaMemcpyDeviceToHost);

    ::cudaFree(pMask);

    ::printf("gpu : %f\n", iMask);

    iMask =
        (float)((fMask & 0xff000000) >> 24) * 0.1122f +
        (float)((fMask & 0x00ff0000) >> 16) * 0.2233f +
        (float)((fMask & 0x0000ff00) >>  8) * 0.3344f +
        (float)((fMask & 0x000000ff) >>  0) * 0.4455f;

    ::printf("cpu : %f\n", iMask);
}

output :
gpu : 113.695999
cpu : 113.695999


//-----------------------------------------------------------------------------
__global__ void __FloatTest(unsigned int* pIn)
{
    unsigned int fMask = *pIn;

    float iMask =
        (float)((fMask & 0xff000000) >> 24) * 0.112233f +
        (float)((fMask & 0x00ff0000) >> 16) * 0.223344f +
        (float)((fMask & 0x0000ff00) >>  8) * 0.334455f +
        (float)((fMask & 0x000000ff) >>  0) * 0.445566f;

    *((float*)pIn) = iMask;
}

//-----------------------------------------------------------------------------
void FloatTest()
{
    const unsigned int fMask = 0x22446688;

    float  iMask = 0.0f;
    float* pMask = 0;

    ::cudaMalloc(&pMask, sizeof(float));

    ::cudaMemcpy(pMask, &fMask, sizeof(unsigned int), cudaMemcpyHostToDevice);

    __FloatTest<<<1, 1>>>((unsigned int*)pMask);

    ::cudaMemcpy(&iMask, pMask, sizeof(float), cudaMemcpyDeviceToHost);

    ::cudaFree(pMask);

    ::printf("gpu : %f\n", iMask);

    iMask =
        (float)((fMask & 0xff000000) >> 24) * 0.112233f +
        (float)((fMask & 0x00ff0000) >> 16) * 0.223344f +
        (float)((fMask & 0x0000ff00) >>  8) * 0.334455f +
        (float)((fMask & 0x000000ff) >>  0) * 0.445566f;

    ::printf("cpu : %f\n", iMask);
}

output :
gpu : 113.714699
cpu : 113.714706

so keep in mind that the float result may different between cpu & gpu.

Nov 1, 2009

Oct 31, 2009

funny type !


struct NullType
{};

template <
    typename X,
    typename Y,
    typename Z = NullType,
    typename W = NullType>
struct Caster
{
    X   x;
    Y   y;
    Z   z;
    W   w;
};

int main()
{
    ::printf("%d\n", sizeof(NullType));
    ::printf("%d\n", sizeof(Caster));
    ::printf("%d\n", sizeof(Caster));
    ::printf("%d\n", sizeof(Caster));
    ::printf("%d\n", sizeof(Caster));
}

There is nothing new in this post. You can find the null type in "Modern C++ design". But the output is really funny (compile with vc9) :
1
4
12
12
4

So...Nulltype occupy 1 byte even it`s null. And it follow a special padding rule (which I`m not interested in now XD).

Oct 30, 2009

CUDA Note[4]="cast & align";


//----------------------------------------------------------------------------- 
__global__ void RCastTest0(unsigned int* rgTar, unsigned int* rgSrc)
{
    rgTar[0] = rgSrc[0];

    rgTar[1] = *(unsigned int*)((unsigned char*)rgSrc + 2);
}

//-----------------------------------------------------------------------------
void RCastTest()
{
    unsigned int  rgTestSrcHost[2] = {0x11223344, 0xaabbccdd};
    unsigned int  rgTestTarHost[2];
    unsigned int* rgTestSrcDevice = 0;
    unsigned int* rgTestTarDevice = 0;
    unsigned int* rgTestDevice = 0;

    ::cudaMalloc(&rgTestDevice, 4 * sizeof(unsigned int));

    rgTestTarDevice = rgTestDevice;
    rgTestSrcDevice = rgTestDevice + 2;

    ::cudaMemcpy(
        rgTestSrcDevice,
        rgTestSrcHost,
        2 * sizeof(unsigned int),
        cudaMemcpyHostToDevice);

    //--cast in cuda
    RCastTest0<<<1, 1>>>(rgTestTarDevice, rgTestSrcDevice);

    ::cudaMemcpy(
        rgTestTarHost,
        rgTestTarDevice,
        2 * sizeof(unsigned int),
        cudaMemcpyDeviceToHost);

    //--cast in cpu
    unsigned int dCasted = *(unsigned int*)((unsigned char*)(rgTestSrcHost) + 2);

    ::printf("CUDA (align)    : 0x%08X\n", rgTestTarHost[0]);
    ::printf("CUDA (un-align) : 0x%08X\n", rgTestTarHost[1]);
    ::printf("CPU  (un-align) : 0x%08X\n", dCasted);

    ::cudaFree(rgTestDevice);
}

A simple test of casting in cuda. The device memory is aligned when being allocated (to 256 byte). Everything is fine if you forget the optimized trick in C. For example, when make gray scale image fome a r8g8b8 one. You can get 3 u8 and calc the luminance, or get one u32 then calc with bit operation. This is OK in CPU, and should get better performance since access global memory is pretty slow in cuda. But you can`t do it like in cpu due to the result of this testing. When reading from global memory, cuda align internal reading address with sizeof casting type. (4 for u32, 2 for u16, etc.)

So there is a trick to do gray scale. If I just calc luma with reading 3 u8, every pixel need 4 global memory accessing (3 read, 1 write). But if I calc 4 pixels in one thread, I can read 3 u32 (and the first one is 4 byte aligned), the average read-write time would be 1.75 per pixel !

p.s.
output :
CUDA (align) : 0x11223344
CUDA (un-align) : 0x11223344
CPU (un-align) : 0xCCDD1122

Oct 29, 2009

CUDA Note [3] = "cudart.dll";

It seems to be not necessary for your client application if you develop CUDA with driver API. But it is not that friendly. (BTW, I have not given it a try, yet.)
It seems to be not necessary to separate CUDA code from main code to different binary since cudart.dll depend nothing special. It only benefit coworker who doesn`t install CUDA sdk.
If develop with CUDA run-time. You have to pack the cudart.dll, too.

So......I guess I have made a wrong decision (separate cuda code to another binary) due to misunderstanding.

CUDA Note [2] = "Driver Version";

cudaGetDeviceCount(&cDevice) may find no CUDA device if your cuda run-time is newer than driver. Currently you can download v2.3 run-time from nvidia, and the driver for notebook is still in beta. If mix the run-time with the other driver, all sample will crash because can`t find any CUDA device.

Oct 24, 2009

CUDA Note [1] = "design strategy";

There are so many "CLs" :

OpenCL
DX11 computing shader
CUDA
Stream SDK

The first problem for me is how to integrate them. I start this kind of programming from CUDA. But it relay on nVidia, and the worst thing is the binary depends on CUDA runtime ..... that means I have to handle every thing when there is no nVidia graphic card. My solution should be COM. Besides, I`ll start to study DX11 once I have win 7. I guess OpenCL will not be so good in the beginning.

Oct 16, 2009

兒時的玩具

整理雜物的時候翻出兒時的玩具，從這些東西看來，我真的是沒什麼童年啊！

九連環，在昨日世界買的。

沒線的溜溜球

仙人擺渡加一些小東西

華容道，也是在昨日世界買的！

還做過一些小玩具，像貴妃稱之類的，不過都丟掉了 XD

Oct 13, 2009

CUDA Note [0] = "How to integrate with VC ?";

1. How to build "*.cu" in VC :

Find “Cuda.Rules” in \\Documents and Settings\All Users\Application Data\NVIDIA Corporation\NVIDIA GPU Computing SDK\C\common\
Copy “Cuda.Rules” to \\MSVS8\VC\VCProjectDefaults\
Open VC solution.
Right click your “project” and select “Custom Build Rules”.
Check “CUDA Build Rule v#.#.#".
Click ok.
Right click “*.cu” file in solution explorer.
Select “CUDA Build Rule v#.#.#” in option “Tool”.
Then vc can build "*.cu" file in your project.

2. How to high light *.cu syntax in VC :

Go to Options of Tools in VC.
Check "File Extension" of Text Editor.
Add "cu" in "Extension" edit box and select Editor.
Click OK and reopen VC.
Now the high light rule you selected is applied.
I use VC++ syntax hight light to edit *.cu.

Oct 10, 2009

篆刻：貓掌

沒想到會刻肖形章，還刻了貓掌！整個印最大的難處該是印文的佈置吧，貓掌的重量十足，破在掌印上就沒了意思，連帶也不適合破邊，最後只挖掉左上角，佈置是否合宜就不是我的功力看得出來的了，但應該還挺「古錐」的吧！

Sep 30, 2009

malloc fail

昨天生出了個BUG，主要的問題是ACCESS VIOLATION，現在還寫得出這種蠢程式真是讓我哭笑不得啊。

MARK說XP對記憶體的保護比較不嚴格，所以就算動到了不該動的位置也不見得會立刻出事。我碰到的情況是MALLOC忽然就失敗了，而整個程式吃掉的記憶體才13MB左右，因此斷定不是LOOP+LEAK，但是由於只是配置失敗，所以程式跑起來好像沒事一樣，只不過東西出不來而已。於是我開了些除錯用的例外，果然程式在出錯的地方丟出意外，仔細檢查後才發現是BUFFER SIZE算錯。

結論：MALLOC會失敗是因為動到HEAP記錄的地方（HEAP CORRUPTION）。

Sep 16, 2009

pack problem again


struct TGA_HEADER
{
    unsigned char   cbIDField;
    unsigned char   iColorMapType;
    unsigned char   iImageType;
    //--1 byte padding

    short           ofsColorMap;
    short           cColorMap;
    unsigned char   cColorMapBits;
    //--1 byte padding

    short           dX;
    short           dY;
    short           dWidth;
    short           dHeight;
    unsigned char   dColorDepth;
    unsigned char   descImage;
};

I have tried to make tga file yesterday and kept fail. When the image being dragged into directx texture tool, it alwayse told me something wrong in that file. But the header should be 18 bytes and the sum of all fields in the struct is 18 bytes, too. Finally, I started to check the pack issue as before. Then the problem emerged ! You can find a comment to mark the 1 byte padding in the code. To fix this only need 2 extra lines :


#include 
struct TGA_HEADER
{....}; 
#include

Sep 4, 2009

Rome

Pantheon 萬神殿

Sep 1, 2009

粗糙的lit pre pass

大概知道 lit pre pass 是怎麼一回事了，然後很嗨的用了一堆super shape 想做個太空都市，結果根本是個廢墟……試了一些東西，大致上可行，點光源還是用球做的，這次加上geometry instancing，應該可以用告示板來代替。之前沒仔細看wolf的blog，結果用兩個角度來表示法向量，但是如果法向量是在view space 裡的話，就可以直接存xy，因為對鏡頭來說，所有可見點的法向量 z都是負的！

現在又有別的問題了（MSAA），但是我想寫個炫點的小程式啊！

Aug 29, 2009

regular expression for real

'(\\b0[xX][0-9a-fA-F]+\\b)|(\\b[0-9]*\\.[0-9]+f?\\b)|(\\b[0-9]+\\b)'

Add this to syntax highlighter to high light "numbers" in 3 format :

0x204080FF
3.14159f
7

Aug 28, 2009

下一站？

下個星期一（20090831）是這張名片有效的最後一天，下個星期二就要到新公司報到了，又是一個全新的開始：新的工作、新的同事和新的名片。轉移陣地不是因為遠大的目標，只是突然出現的機會加快了我的腳步。

感到遺憾的事：

等級不到三！
沒在p4上寫笑話！
沒跟review team吵過架！
沒有3d！
沒有gf！

感到高興的事：

轉team！
好同事！
沒了！

收護：

進過大公司！
進過軟體公司！
比較懂3d了！
coding skill比較好了（原因同上）。

不爽的事：

.-.. . --. .- -.-. -.--
..-. .. -..- -.-. --- -- .--. --- -. . -. -
.-- . .- -.- ... -.- .. .-.. .-..

該吃飯了，想到再寫。

syntax highlighter for HLSL10

After add a new brush js to SyntaxHighLighter :


Texture2D g_NSZTexture;
Texture2D g_LitTexture;

float4x4 g_mWorld;
float4x4 g_mWorldViewProjection;

VtxNormalPass VSNormalPass(
    float4 pos : POSITION,
    float3 nor : NORMAL,
    float2 tex : TEXCOORD0)
{
    VtxNormalPass output;

    output.spos = mul(pos, g_mWorldViewProjection);
    output.wpos = mul(pos, g_mWorld);
    output.wnor = mul(nor, g_mWorld);

    return output;
}

PixOutput PSNormalPass(VtxNormalPass input) 
{
    float3 nor = normalize(input.wnor);

    float phi = acos(nor.y);

    float len = sin(phi);

    float the = acos(clamp(nor.x / len, -1.0f, 1.0f));

    if (nor.z <= 0.0f)
        the = 6.283125f - the;

    float z = length((input.wpos - g_vEyePos).xyz);

    PixOutput output;

    output.color = float4(the, phi, z, 4.0);

    return output;
}

technique10 NormalPass
{
    pass P0
    {
        SetVertexShader(CompileShader(vs_4_0, VSNormalPass()));
        SetGeometryShader(NULL);
        SetPixelShader(CompileShader(ps_4_0, PSNormalPass()));

        SetRasterizerState(CullBack);
        SetDepthStencilState(EnableDepth, 1);
        SetBlendState(BSOverWrite, float4(0.0f, 0.0f, 0.0f, 0.0f), 0xFFFFFFFF);
    }
}

Aug 26, 2009

Trigonometric function explosion

這幾天寫HLSL碰到一個怪問題，存nomal的texture上總是會有多餘的碎片，原因大概是經過一連串三角函數的運算，不知道是三角函數還是浮點數出了問題，導致這段程式最後的the在應該接近π時變成非正數（小於等於0）：


float3 nor = normalize(input.wnor);
float phi = acos(nor.y);
float len = sin(phi);
float the = acos(nor.x / len);

最後改成這樣就搞定了：


float the = acos(clamp(nor.x / len, -1.0f, 1.0f));

Aug 22, 2009

不負責任單車遊記

公司單車社中社路前補胎。

Ryo + Ivan，我跟dynacolor的同事去北海岸湊熱鬧。

Aug 17, 2009

ui rendering optimization

Forget everything, the geometry instancing is your best friend ! In the picture, I rendered 4800 2d rectangles in 1024x768 resolution. If each rectangle is rendered with one draw call, there are only 78 frames per second on my G1S. But if all of them are rendered by geometry instancing (which rendering 512 rectangles in one draw call), the fps jump to 662. (Of course, this is not exciting since you can do more complicated things with geometry instancing.)

Aug 14, 2009

塗鴉

跟同事e-mail 時的塗鴉：

Jul 30, 2009

screen space motion blur

I don`t know how to implement motion blur in general method. In the sample of d3d9, the document said that it renders each object many times before one present call. And d3d10 implement them by geometry shader. I`m lazy and wanna try GLSL in my Mac. So I made a simple demo to show motion blur in screen space. The general scene with 16 ugly water mellon :

Blurred in screen space :

You need 2 new render targets, let`s call them A & B. Blend your scene on A, then alpha blend B on A. After that, blend A to back buffer then switch A & B. It`s pretty simple since you don`t even need GLSL (if you can alpha blend in fix function pipeline). The alpha value used in the screen shot is 0.9. But there are some troubles in this method when the moving object overlapped in screen space......I guess you can image it, so there is no screenshot here :)

Jul 26, 2009

Anything wrong in glShaderSourceARB ?

It cost me many hours to find what was wrong with glShaderSourceARB. When I wrote this :


char v[128] =
    "void main(void)\n"
    "{\n"
    "    gl_Position = ftransform();\n"
    "}\n";

glShaderSourceARB(shaderVtx, 1, (const GLcharARB**)&v, 0);

My application always crashed in "strlen". But if I change the code to :


char v[128] =
    "void main(void)\n"
    "{\n"
    "    gl_Position = ftransform();\n"
    "}\n";

const GLcharARB* ppV[1] = {(GLcharARB*)v};

glShaderSourceARB(shaderVtx, 1, ppV, 0);

Everything is fine. Do I forget how to write C ? (Although I insert them in objective-c.) It looks like due to compiler setting since the first one works in sample of xcode.

20090802
Ok, nothing wrong except my careless style. This is a very basic pointer usage problem. "v" in the code does mean the start address of a string, but not a pointer to a string (althought the type is a pointer). That means "&v" is not in the memory. But why the sample works ? Because the sample pass "v" to another function which would generate a new local pointer to v, so we can find this pointer in memory. Luckily, no one would read this article. XD

Jul 21, 2009

Hello Watermelon !

Hello ! Watermelon ! You are so sweet in this summer. Actually, it`s "Hello world" of OpenGL. It`s my first OpenGL program (except the other one with OpenGL ES which is pretty simple). I wanna make something with OpenGL & GLSL & my Mac, that`s the reason why I start to swallow OpenGL.

Jul 17, 2009

在玩 WPF 3D 之前

也許WPF根本不打算支援3D，但是在確定這些項目前，WPF 3D 對我沒什麼吸引力：

有些時候不需要 face culling，這些時候不做會有比較好的效能。WPF一定做 face culling，當你不需要時，只能用兩組三角形來解決。
有越來越多的特效需要多個pass來完成，WPF似乎不管這件事。
有越來越多的特效需要換 render target，但是因為WPF的redering被完整封裝，沒辦法讓我胡搞。
geometry 的 member 有缺，只有 vertex、index、texture coordinate、normal，在寫San Angels 的時候，我想在頂點上填色卻不知道怎麼辦。
光源必然影響整個scene tree。
想到再加。

Jul 16, 2009

Hello SyntaxHightlighter !


#include 

int main()
{
   printf("Hello SyntaxHightlighter !");
}

終於弄出來了，有興趣的話原始專案在這。當初在彩富還特別寫了個轉檔程式，沒想到有更漂亮的方法！幾點注意事項如下（參照本頁的原始碼）：

java script 的 source 可以放在blogspot提供的元件上（本頁最下面那條槓）。

css 的 link 得直接放到head上。

放在blog上得在HightLightAll前加上dp.SyntaxHighlighter.BloggerMode();

現在比較頭痛的是script放在google page上，而google準備把google page移掉，雖然說會轉移，不過google site禁止上傳js檔，不知道到時會不會失效啊！

Jul 13, 2009

San Angeles in WPF

最近看 WPF 3D，可能是對SDK不夠熟，覺得WPF在3D的表現非常貧乏。上面是用WPF+C#，仿 San Angeles 寫了個小程式，看起來還很單調。不過這東西最吸引我的是super shape，所有的模型都是用super fomula算出來的。因為WPF的限制，加進去的光源會影響整個場景，但是我希望倒影是比較暗的，像這樣：

由於我懶得細想，所以直接疊了兩層viewport，一層畫倒影，一層畫實景，這樣變成分開的兩個場景，可以設不同的光源，最後得到比較接近原程式的結果。

Jun 30, 2009

projection in vertex shader

一般的perspective projection ：

在 vertex shader 動手腳的 projection：

不知道為什麼下圖有點走樣，不過似乎可以靠vertex shader 讓 z 值變成線性的，這樣的話應該可以避掉z fighting，效能差多少就不清楚了。

deferred lighting

In the beginning, I wanted to know how "light pre-pass" works. But I couldn`t catch that idea easily. Then I started to study defered lighting. Finally, I make a sample after wasting whole GPU powers. (yes, I implement it without any optimazation !) Here is a screenshot, 128 point lights rotate around each balls. It looks like the reflection of wave can be implemented by the same way.

Jun 29, 2009

UNORM v.s. UNORM_SRGB

DXGI_FORMAT_R8G8B8A8_UNORM_SRGB :

DXGI_FORMAT_R8G8B8A8_UNORM :

DXGI_FORMAT_R8G8B8A8_UNORM 才是我要的結果！dx 10 DXUT 的back buffer format 預設只選DXGI_FORMAT_R8G8B8A8_UNORM_SRGB！

p.s. 材質是這裡來的。

國三數學課本

國二時在外掃區撿到一本國三的數學課本，課本都寫得淺顯易懂，所以兩三下就翻完了，也覺得幾何題很有趣。不過由於沒有過人的天賦，加上注意力很容易因為其他有趣的事轉移（像是做白日夢），所以「高中聯考」之後就只碰過幾分鐘的幾何學。大一時學校的總圖還沒搬家，悶熱的空氣加上愛打瞌睡的天性，還來不及沉浸在剛找到的幾何學書籍，就結束了短短幾分鐘的邂逅！

中午在麥當勞的白日夢上演了這段往事，記得國三時的一次考式，卷子上的幾何題出得比較特別，寫完證明方法後整個欄位都被填滿了，然而交換卷子後同學卻說他看不懂，那時我還很有自信的請他去問老師對不對，結果是對了，不過老師說：「不知道我是怎麼學的！」事件的經過不怎麼重要，不過這讓我開始意識到自己思考問題的方法，直到現在都是同一個模式。

如果問題是A，答案是Z，可能存在最佳解法M，我的思路常常變成ABCZ或是ADEMZ，B或D是前幾個在腦袋裡浮現的解法，是「邏輯上」可行的做法，然後不斷的往前延伸，直到一連串邏輯上可行的解法達到Z，因為M比其他的路隱晦許多，所以雖然沒找到最佳解，但通常可以很快地找到可用解。壞處是只要問題夠複雜，這個方法就容易走進死胡同。即使是寫程式，整個流程也是這般流泄而出，通常能很快的寫出可行的程式，再花時間最佳化。就算一開始就找到M，也沒人能保證A到Z之間有更好的解法，也許把紙對折，A和Z就自然碰上了。

三年級的數學作業有這麼一題，一段圓弧跟圓弧外的點A，畫一條直線連接A跟圓弧的圓心。問題很簡單，在弧上畫兩段不重疊的弓，它們的中垂線會在圓心相交。不過當時我可不是這們寫的，因為題目的位子畫得太好了，所以只要以A為圓心，畫個弧能夠跟本來的弧線有兩個交點，那可以少做很多事……所以從國中開始，我就有當程式員的徵兆了嗎？

Jun 22, 2009

when to SSE ?

Long long ago, when I playing with Jaina, I don`t know why my Jaina can`t act as smooth as Blizzard`s. But there is a rumor about how to make Jaina move more like a really young girl. SSE is the first solution.

I don`t find any document about cost of SSE (ohh...I`m lzay, you know that...). But in my experience, there is a simple rule. The most general usage of SSE is matrix multiplication. And there are many many multiplication in bone skin animation. But you`ll find it cost more CPU power if you only write SSE to do "one" vector multiple "one" matrix.

The simple rule is :
if the number of vector multiplication is more than double of vector IO, SSE will gain higher performance.

For example, the dot value of 2 vectors need 3 IO (2 vectors in, one value out), but there is only 1 vector multiplication. Another example is vector multiple matrix, 6 IO (5 in, 1 out) with 4 vector mul ...... so we still can`t get better performance.

But in the bone animation case, there are less bones with many many point. That means many points will mul the same matrix in each frame. If you have to mul N points, you need :

4 vector reading from matrix.
N vector reading from points.
N vector writing to points.
each point need 4 vector mul.

Ignore the 1`st one, it just meet my simple rule. So there is a chance make my Jaina act more smooth. (Just write a function to mul many vectors to one matrix).

BTW, there are many 0 in normal 3d matrix......that`s another story.

Jun 21, 2009

座位上的小丸子

某次公司發了粉筆跟一個杯子，要我們發揮創意在杯子上塗鴨！創意我是沒有的，所以拿起粉筆就在旁邊的牆上描了個小丸子！

Jun 20, 2009

WarHammer[3];

中間的棚子有 LOD ……就在這麼近的距離下，玩game 的時候都在注意這些會不會太無趣了 XD

WarHammer[2];

這個世界充滿了泥濘，讓我走起路來像缺了條腿！其也沒這麼誇張，只是不管走到哪，都覺得頓！現在終於讓我抓到凶手了，請看：

抬頭望天，晃晃角色的視野，這時會頓！

低頭撿錢……好順啊！怎麼會這樣？看到第一張圖的紅圈圈了嗎？再來比對這一張：

樹林滿是三角形啊！看起來密密麻麻地貼了好幾層！地上的灌木叢也是：

看來這比wow豐富多了！只可惜在wow的荊棘谷時，我一樣腳步輕盈，從來不覺得自己太胖該減肥，即使控制的角色是隻大笨牛！樹林看起來相對豐富，但似乎是用三角形換來的，難怪控制起來舉步維艱啊！

WarHammer[1];

這個遊戲最最最明顯的問題，就在人物移動的時候常會莫名其妙卡點！玩了兩個星期，常常亂跑亂跳後就以為自己卡點了，還得亂跑亂跳一陣才能脫離！傳說中，戰鎚的特色之一是玩家間的碰撞，人物間的碰撞應該不是難題，大概就是OOBB吧！但是玩起來著實讓我很難過，邊放法術邊移動，不但要看敵方從哪個方向跟來，還得注意前方是不是有身型瘦小的種族擋路，被擋住就完全不能前進，不會因為按著前進就慢慢的滑開。至於怪物就沒有碰撞了，他們打我的角色時，可就經常利用這一點，穿過角色身體，從背後開始攻擊！

上圖，我穿不過一個在火盆與旗子之間的空隙，因為bounding box 卡到火盆了，完全無法前進。

Jun 19, 2009

所謂仁政

「追根究底來看，所謂仁政，只不過是創造一個老實人不會吃虧的社會而已。」

塩野七生．《羅馬人的故事 IX》

Jun 17, 2009

WarHammer[0];

最近玩戰鎚，不過大概快玩不下去了，bug很多，而且G1S跑起來也不順，3G的RAM加上8600GT還不能順暢的運作，這個game一定有些問題，雖然設定上還有些意思，但我從第一天開始就覺得：無論美術程式，都無法與Blizzard 打對台。今天累了，提一個奇怪的rendering bug就好，其他有空再說。

圖不是很清楚，左上角紅框裡還有個「方框」，喔不！這是個誤會，實際上那是盞燈，週遭則是充滿煙霧（風沙？）的場景，會跑出個這麼突兀的方框是 z check 跟 z write 的問題，燈看起來是張textue (texture animation？maybe）的billboard，屬於半透明物件，整片煙霧也是半透明物件，這兩個東西都會等到整個場景畫得差不多後才會畫上去，為得是半透明的渲染未經排序的話結果會是錯的。我猜問題是這樣發生的：燈影跟煙霧畫的順序不一定，有時燈先畫，有時煙霧先，所以這個問題不是一直存在。接著，畫燈影的時候 z write 是開著的，畫霧的時候 z check 是開著的，結果燈先畫的話，因為 z write（z 值還可能是錯的），導致畫煙霧時 z check 後沒辦法畫那一塊，最後就生出一個方框了。

ZZzzZzZzzzz...快睡著了......

why UnAdvise ?

If you are familier with COM, you would usually work with AdviseFooEventSink / UnAdviseFooEventSink. You have to call UnAdviseFooEventSink explicitly somewhere ...... it`s better not in the scope of this event sink, especially in the destructor of event object. But WHY ?

I try to demo something by directshow recently. When working with filter, I made some mistake in accident today. I did something like UnAdviseFooEventSink in the destrucotr of event object. Let`s check what happen :

when the event sink object is created, the reference count should be 1.
when Foo Advise the sink object, reference count of sink object increase to 2.
when I don`t need the sink object anymore and try to release it, reference count decrease to 1.
Now the owner of the sink object is Foo, and I wish sink object to be UnAdvise when being deleted.
Since Foo is still the owner of sink object, sink object won`t be deleted if Foo doesn`t do extra work.

Thanks for the debugging function of baseclass of dshow, I got some ugly assert and try to fix it.

BTW, the same isuue would happen in cocoa of Mac, too. If 2 objects are the owner of each other (retain each other), they should be disconnected before release the final reference outside of the scope.

Jun 1, 2009

some methods to optimized gui rendering.

I thought about gui rendering today and got some ideas. Here is a brief note (but this should be useless since ui rendering is an old topic).

Sometimes we have to render ui with 3d API such as d3d. For example, it`s hard to rendering ui with video without d3d on windows. The generic method to rendering gui is to render many "rectangle", like buttons, checkbox, combobox, etc. Each rectangle need one draw call. When there are too many controls (include text), it may call draw primitive too much times and start to affect the performance. There are some points we can try :

[1]. Sort entire gui rendering by texture. The first issue make us render rectangle one by one is : all images may not on the same texture. For controls that share the same texture we have a chance to draw many "triangle list" to reduce the draw call.

[2]. Separate gui to dynamic & static part. All controls share the same "dynamic" vertex buffer. For dynamic controls, changing vertex position to move their position. For static controls, their position is fixed. So all vertices in the buffer can sort by their state to 2 part, too. Then there is a chance to minimize the vertex update method.

[3]. All controls share the same "dynamic" index buffer. For those hided control, we can just ignore them by not applying index. Sort the index could get better performance, too.

[4]. If there is alpha controls, separate them to another pass......or everything is gone.

[5]. So now you can :

get a single texture with all images of controls.
update position of dynamic controls by update vertex buffer.
update visibility of controls by update index buffer.
collect all characters in the same texture (or less textures)
pray.

[6]. I am kidding... please inform me if you tried it.

May 28, 2009

Prince of Persia !

怎麼可以有了新歡就忘了舊愛呢？G1S 我還沒忘記你啊！且看看你做了什麼事！

話說前一陣子在玩新的波斯王子，還沒玩完，不過已經讓我碰到幽靈事件了：

走近一看……

公主到底是怎麼飛上去的啊！波斯王子的畫面依然令我驚豔（之前只玩過時之砂），改變不大，要說最大的變化就是……這一代是波斯公主遇到一個很強的路人，只是公主太強大了，所以只要路人在年老力衰前達成任務即可，就算往懸崖跳一百次，公主也不會只救你九十九次！比較難過的是隨著劇情發展，公主跟路人會開始調情……是怎樣！打個電動都活該被閃嗎！ \_/

stupid flying triangle

Finally, I rendered a stupid flying triangle in iPhone simulator. Though I never investigate OpenGL before, but it easier than d3d. So writing code to render simple object is quickly. But I am usually confused with gl functions. Like the "glTranslatef", this C function can`t tell me which matrix I try to translate now, just like many old C functions. It`s fortunate that I still don`t have to take care heavily matrix calculation in current stage.

I guess it`s time to start to make a simple game after learning how to render text, since the basic touch function need only override some method in UIView.

BTW, I tried to programming on symbian (sdk from nokia) long long ago. Maybe I was too weak so I could not make the simulator work correctly. I bough a book for that OS, but I never touch them again after several "days". (OK, that happened in 4th month of my programming life).