Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: SSE2 bit[64] * byte[64] dot product

Author: Gerd Isenberg

Date: 01:52:26 07/18/04

Go up one level in this thread


>I am guessing something like 50 cycles?  Really not that bad . . . probably
>close to the speed of a scan over attack tables.
>
>anthony

Yes, less than 30 SSE2-instructions, almost no register stalls,
but about 50 cycles ;-(

Ok, double direct path instructions have almost 2 cycles latency,
4 cycles if memory operand, psadbw has 4.

I'll hope AMD's promise from optimization guide comes true some day:

Chapter 9 Optimizing with SIMD Instructions

...

• Future processors with more or wider multipliers and adders will achieve
better throughput using SSE and SSE2 instructions. (Today’s processors implement
a 128-bit-wide SSE or SSE2 operation as two 64-bit operations that are
internally pipelined.)

Gerd



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.