Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: sliding attacks in three #define

Author: Gerd Isenberg

Date: 09:51:03 04/10/04

Go up one level in this thread


On April 10, 2004 at 09:43:09, Christophe Theron wrote:

>On April 10, 2004 at 04:42:06, Gerd Isenberg wrote:
>
<snip>
>>During the "evolution" of my program from rotated to fill based, things changed
>>a bit. I probably do a lot for "nothing" - but i do it only once, and i do it
>>unconditionally and parallel with other tasks. Often with todays super pipelined
>>processors, performing two or four independent, unconditional instructions
>>chains doesn't matter so much as long as you have enough registers...
>
>
>
>I wouldn't be so sure...
>

I mean such pipeline miracles, using both float/mmx/xxm alus (mul/add) and other
resources perfectly:

-----------------------------------------------------------------------------
Software Optimization
Guide for AMD Athlon™ 64 and
AMD Opteron™ Processors

Chapter 9 Optimizing with SIMD Instructions page 227/228

....

Multiplying four complex single-precision numbers only takes 17 cycles as
opposed to 14 cycles to multiply one complex single-precision number. The
floating-point pipes are kept busy by feeding new instructions into the
floating-point pipeline each cycle. In the arrangement above, 24 floating-point
operations are performed in 17 cycles, achieving more than a 3.5x increase in
performance.
-----------------------------------------------------------------------------

Of course an extrem case, but i have already made some experience with mmx fill
stuff, doing up to four directions in parallel...

>
>
>
>>One other example i have in mind with my future 64-bit approach is using pairs
>>of bitboards to generate white as well as black sliding attacks at once with
>>128-bit xmm registers.
>
>
>
>I think there is even less use for attack tables for both sides than for attack
>tables of the side to move.

Ok, with pure pseudo legal move generation for sure. But for "more accurate"
pruning/reduction/extension decisions, lazy eval decision, eval stuff, stalemate
detection at the leaves, and SEE-like move sorting, etc.?

I will give it a try, all attacks, all sliders including king as metaqueen,
disjoint directionwise for each sliding piece (piecekind), and combined
piecekind-  and directionwise, pinned pieces, remove-checker, several taboo
bitboards, hanging, en prise, check targets, legale, direction wise move
targets[ply] for movegen bookholding... in about 300-500 cycles (And probably
even faster with future cpu's, e.g. if xmm alus became 128bit wide as already
mentioned in amd optimization guide).

Ok, for lot of lazy eval cutoffs it's still too expensive.
Otherwise it may help to avoid "wrong" cutoffs and smells some forced mate
threat or other tactical stuff, not to mention hiding some prefetched hash read.
What is more important?

Maybe i learn that all this bitboard stuff is evil ;-)

Cheers,
Gerd

>
>
>
>
>    Christophe



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.