Subject: Re: SSE2 bit[64] * byte[64] dot product

Author: Anthony Cozzie

Date: 07:59:05 07/19/04

On July 18, 2004 at 15:33:33, Gerd Isenberg wrote:

>>I am guessing something like 50 cycles?  Really not that bad . . . probably
>>close to the speed of a scan over attack tables.
>14.45ns on a 2.2GHz Athlon64, ~32 cycles now.
>Some minor changes, byte vector values (weights) 0..63, therefore only one
>psadbw, no movd but two pextrw, final add with gp. Computed bit masks in two
>xmm-registers (0x02:0x01). Some better instruction scheduling.

If you would ship me the new code I would be much obliged (
 I am concentrating on parallel code right now, but once that is done I am going
to do some serious work on my eval.  I want to prove Vincent wrong that a good
eval cannot be done with bitboards :)

32 cycles is _really_ good.  I think that on average rotated bitboard attack
generation is 20 cycles, so that is 50 cycles / piece / mobility = 500 cycles
(~250 ns on my computer) for all pieces, which is really not bad.  In fact, 32
cycles is not that much slower than popcount!


