Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Are bitboards really faster on 64-bit hardware?

Author: Gerd Isenberg

Date: 23:21:26 11/06/03

Go up one level in this thread


On November 06, 2003 at 19:29:09, Russell Reagan wrote:

>On November 06, 2003 at 17:04:11, Gerd Isenberg wrote:
>
>>hmm, a few aspects: are the Athlons you compare really equal?
>>Let's see how other programs will perform. Sjeng may profit more from opteron
>>features independent from 64-bit stuff, e.g. bigger cache, more registers and
>>therefore less store/loads of locals and function parameters, faster memory and
>>smarter branch prediction. And 64-bit bsf is still expensive vector path
>>instruction with 9 cycles latency where all other pipes are blocked.
>
>Hi Gerd,
>
>Have you had a chance to test out your KoggeStone stuff on an Opteron yet?

Hi Russell,

no, not yet.

> One
>thing I still wonder about KoggeStone is if it will be faster than rotated
>bitboards on 64-bit hardware. You have said in the past that your KoggeStone
>stuff was faster than your rotated bitboards on 32-bit hardware, but you also
>used the 64-bit MMX registers.

Not exactly. An 1:1 replacement of single piece attack getters was slightly
slower. But the introduction of a combined getAllAttacks-routine gave some
speedup, and an easier, unconditional pinned piece detection routine in
conjunction with my legale move generation.

>Surely your KoggeStone stuff will be faster on an
>Opteron by some amount, but it seems like it won't get as much of an improvement
>since you are already using the 64-bit MMX registers while those who are using
>rotated bitboards are still using 32-bit registers on 32-bit machines.

I will change my bitboard infrastructure in some way.
Currently i do some things twice or more often on the fly, e.g. filling with
singular rooks and filling with all rooks and queens of one color.
There is something ro improve.

On opteron i will try SSE2 with 16 128-bit xmm-registers, processing two
bitboards, e.g. white and black in one direction at once. Due to more registers
one may even do more independent fill directions in parallel.

I introduced already an intrinsic __m128i wrapper class here and played a bit
with it on a P4. Therefore it is easy possible to write KoggeStone and other
fill stuff in C-syntax, but using xmm-registers or pairs of gp-registers, only
by changing the type of a set of register-variables, e.g for one fill direction.

Cheers,
Gerd



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.