Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Crafty profits little from Itanium and Opteron versus Commercials

Author: Gerd Isenberg

Date: 14:32:42 08/07/03

Go up one level in this thread


On August 07, 2003 at 16:16:25, Sune Fischer wrote:

>On August 07, 2003 at 15:40:33, Gerd Isenberg wrote:
>
>>On August 07, 2003 at 08:24:28, Sune Fischer wrote:
>>
>>>On August 07, 2003 at 08:15:08, Uri Blass wrote:
>>>
>>>>>Crafty is 64 bit prog, which means it's slow on 32 bit, even I have found that
>>>>>doing a lookup is faster than shifting, I simply never do 1<<sq, I use a table
>>>>>for that.
>>>>
>>>>I guess that it is only for 64 bits and if you have 32 bits number then it is
>>>>better to do 1<<i when 0<=i<32 and not to use arrays.
>>>>
>>>>Correct?
>>>
>>>If you can do the shift in 1 clock, then you can't go any faster, but 64 bit
>>>shifts are slow on old 32 bit chips so the table becomes faster.
>>>
>>>So for pure 64 bit you get fewer tables, faster and cleaner code.
>>>
>>>-S.
>>>
>>
>>Hi Sune,
>>
>>Exactly! On the other hand, i believe that there is no need to use 64 bits
>>everywhere, if 32 bits are enough. Using the standard six 32-bit register set is
>>still fine with Opteron and one byte shorter opcode due to missing REX prefix.
>>
>>I don't know sizeof(int) in AMD64 compilers, still 4, or 8 per default.
>>But of course there are explicite 32- or 64-bit types, signed as well as
>>unsigned.
>>
>>I'm strained about what is the fastest 64-bitscan on opteron, specially if two
>>scans should be done simultaneously e.g. to get a move from/to index:
>>
>>1. Matt Taylor's 64-bit mul with de Bruijn sequence.
>>2. Folded 32-bit mul with Matt's super magic de Bruijn sequence.
>>3. bsf, still vector path and 9 cycles.
>>
>>But i have to wait some time, until i can try it ;-(
>
>Well you're the expert, I just hope you post your findings here :)
>
>One thing I'm very interested in, is if floodfillers will be fast enough to
>replace rotated. It would be nice if getting the bit wasn't needed, also to do
>away with the incrementally updated occupied rotated boards.
>What a "pure" code that would be :)
>

Not quite sure, Sune.

So many promisting options with opteron including rotated ;-)

What about this approach?

Kogge-Stone propagators in MMX, generarors in sixteen 128-bit XMM, eg.
simultaneously for default material:
black:white rook1,rook2,queen as rook but white:black king as rook meta slider,
black:white bishops,queen as bishop but white:black king as bishop meta slider.

Two opposite direction parallel or interlaced. Pinned pieces or covered checker
on the fly with some and/ors. Very easy unconditional stuff, 5 up to 8 or more
independend MMX/SSE2-instructions in a row. There is only some const* source
pointer (rsi) and target-structure/class pointer (rdi) for intermediate attack
results for later eval and movegen/sorting SEE use. As well disjoint directions
attacks and disjoint piece attacks.

May be "en passant" and out of order some gp-register processing to keep the
pipes really busy, some easy pawn or knight stuff in C.

Of course there are several incarnations of this routine, eg. a more expensive,
but general one for all cases of not usual material with more than a queen per
side, more than two rooks, bishops or knights, more than one bishop on same
colored squares, and even cheaper one e.g. for pawn/knight endings, where pins
are not possible. One initial material dependent switch as the one and only
condition here.

Due to the amount of information, disjoint and aggregated output of these
routines, a legal move generator based approach may outperform rotated,
specially when double direct path sse2 instructions became single in the future.
This routine is even a nice place to put a prefetch instruction before.

Even rotated i consider fast as hell on opteron with 32KByte lookup or less.

Gerd



>-S.
>
>>Cheers,
>>Gerd
>>
>>
>>
>>>>Uri



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.