Computer Chess Club Archives




Subject: Re: SSE2 bit[64] * byte[64] dot product

Author: Gerd Isenberg

Date: 01:52:26 07/18/04

Go up one level in this thread

>I am guessing something like 50 cycles?  Really not that bad . . . probably
>close to the speed of a scan over attack tables.

Yes, less than 30 SSE2-instructions, almost no register stalls,
but about 50 cycles ;-(

Ok, double direct path instructions have almost 2 cycles latency,
4 cycles if memory operand, psadbw has 4.

I'll hope AMD's promise from optimization guide comes true some day:

Chapter 9 Optimizing with SIMD Instructions


• Future processors with more or wider multipliers and adders will achieve
better throughput using SSE and SSE2 instructions. (Today’s processors implement
a 128-bit-wide SSE or SSE2 operation as two 64-bit operations that are
internally pipelined.)


This page took 0.07 seconds to execute

Last modified: Thu, 07 Jul 11 08:48:38 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.