Author: Gerd Isenberg
Date: 11:27:05 09/03/03
Go up one level in this thread
On September 02, 2003 at 16:52:24, Dezhi Zhao wrote: >Hi! > >I know that some programmers here have played with MMX/SSE/SSE2 quite a lot. I >am wondering if the new SSE registers and xor op can beat the regular registers >in calculating the hash key that are 64 bit operations. Have anybody tried this? > >Regards, >dzhao With AMD64 i guess it's faster to use gp-register for that purpose. For Athlon (MMX) or P4 (MMX,SSE2) you may try it, but i don't believe it is worth. MMX pxor instruction latency is 2 cycles, xor is only one. Unless you use MMX with lot of independent instruction chains, that don't pays off. You may try all other incremental updated bitboard stuff too with MMX, like this (with properly 8 byte aligned bitboards): movq mm0, [esi].hashkey movq mm1, [esi].fromtoBB ; (1<<from) | (1<<to) movq mm2, [esi].occupied movq mm3, [esi + 8*eax].pieces pxor mm0, [_hashtbl + ebx...] pxor mm2, mm1 pxor mm3, mm1 movq [esi].hashkey, mm0 movq [esi].occupied, mm2 movq [esi + 8*eax].pieces, mm3 Note that there is no "pxor [memory], register" instruction! With SSE2 you have 128-bit XMM-registers. Instruction latency is 2-cycles (Opteron), but they are double direct path instructions, which require two 64-bit operations. Using only the low quadword of an XMM-register is a bit wasteful. Note that windows for AMD64 don't safes MMX during context switch in 64-bit mode! So with Opteron in mind i would clearly favor simple c-code and no SSE2-intrinsics. Gerd
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.