Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: SSE2 Instructions and hash key calculations

Author: Gerd Isenberg

Date: 11:27:05 09/03/03

Go up one level in this thread


On September 02, 2003 at 16:52:24, Dezhi Zhao wrote:

>Hi!
>
>I know that some programmers here have played with MMX/SSE/SSE2 quite a lot. I
>am wondering if the new SSE registers and xor op can beat the regular registers
>in calculating the hash key that are 64 bit operations. Have anybody tried this?
>
>Regards,
>dzhao

With AMD64 i guess it's faster to use gp-register for that purpose.
For Athlon (MMX) or P4 (MMX,SSE2) you may try it, but i don't believe it is
worth.

MMX pxor instruction latency is 2 cycles, xor is only one.
Unless you use MMX with lot of independent instruction chains, that don't pays
off. You may try all other incremental updated bitboard stuff too with MMX, like
this (with properly 8 byte aligned bitboards):

movq  mm0, [esi].hashkey
movq  mm1, [esi].fromtoBB ; (1<<from) | (1<<to)
movq  mm2, [esi].occupied
movq  mm3, [esi + 8*eax].pieces

pxor  mm0, [_hashtbl + ebx...]
pxor  mm2, mm1
pxor  mm3, mm1

movq  [esi].hashkey, mm0
movq  [esi].occupied, mm2
movq  [esi + 8*eax].pieces, mm3

Note that there is no "pxor [memory], register" instruction!

With SSE2 you have 128-bit XMM-registers. Instruction latency is 2-cycles
(Opteron), but they are double direct path instructions, which require two
64-bit operations. Using only the low quadword of an XMM-register is a bit
wasteful.

Note that windows for AMD64 don't safes MMX during context switch in 64-bit
mode!

So with Opteron in mind i would clearly favor simple c-code
and no SSE2-intrinsics.

Gerd




This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.