Author: Matt Taylor
Date: 15:36:20 02/20/03
Go up one level in this thread
On February 20, 2003 at 06:57:17, Ed Schröder wrote:
>On February 20, 2003 at 04:26:12, Russell Reagan wrote:
>
>>This is just a guess, but maybe you could do some kind of trick with
>>redefinition? On your webpage, you describe it.
>>
>>"HINT: WB and BB are zeroed at the beginning of EVAL, this is a costly operation
>>in C, here is a trick to speed it up using redefinition:
>> unsigned char WB[64],BB[64];
>> long *PWB = (long *) WB; // redefine char (8-bit) to long
>>(32-bit)
>> long *PBB = (long *) BB;
>>
>> PWB[0]=PWB[1]=PWB[2] ..... =PWB[15]=0; // 16 x 32-bit stores, clear WB
>> PBB[0]=PBB[1]=PBB[2] ..... =PBB[15]=0; // 16 x 32-bit stores, clear BB
>>
>> This is about 8-10 times faster then the usual:
>>
>> for (x=0; x<=63; x++) { WB[x]=0; BB[x]=0; }
>>Make sure that your compiler's alignment at least is set to 32 bit so that the
>>generated memory addresses of WB and BB are divisible by 4. In most compilers
>>the default setting is 32 or 64 bit which is okay."
>
>
>>Since your values are 8-bit chars, it is possible that there could be a trick to
>>speed up the search, by looking at 4 bytes at a time. Or maybe not :)
>
>Tried that of course, reading 4 bytes in EAX, then using AH and AL, then a BSWAP
>to get the next AH/AL, no speed improvement :(
>
>Ed
>
>>Russell
You can search using MMX for parallel 8 byte scans, then tabulate results at the
end.
Also, to zero the arrays initially, this works best in asm:
pxor mm0, mm0
movq [wb], mm0
movq [bb], mm0
movq [wb+8], mm0
movq [bb+8], mm0
movq [wb+16], mm0
movq [bb+16], mm0
movq [wb+24], mm0
movq [bb+24], mm0
movq [wb+32], mm0
movq [bb+32], mm0
movq [wb+40], mm0
movq [bb+40], mm0
movq [wb+48], mm0
movq [bb+48], mm0
movq [wb+56], mm0
movq [bb+56], mm0
emms ; If you do any FP stuff elsewhere, but you can omit otherwise
-Matt
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.