Author: Gerd Isenberg
Date: 09:51:03 04/10/04
Go up one level in this thread
On April 10, 2004 at 09:43:09, Christophe Theron wrote: >On April 10, 2004 at 04:42:06, Gerd Isenberg wrote: > <snip> >>During the "evolution" of my program from rotated to fill based, things changed >>a bit. I probably do a lot for "nothing" - but i do it only once, and i do it >>unconditionally and parallel with other tasks. Often with todays super pipelined >>processors, performing two or four independent, unconditional instructions >>chains doesn't matter so much as long as you have enough registers... > > > >I wouldn't be so sure... > I mean such pipeline miracles, using both float/mmx/xxm alus (mul/add) and other resources perfectly: ----------------------------------------------------------------------------- Software Optimization Guide for AMD Athlon™ 64 and AMD Opteron™ Processors Chapter 9 Optimizing with SIMD Instructions page 227/228 .... Multiplying four complex single-precision numbers only takes 17 cycles as opposed to 14 cycles to multiply one complex single-precision number. The floating-point pipes are kept busy by feeding new instructions into the floating-point pipeline each cycle. In the arrangement above, 24 floating-point operations are performed in 17 cycles, achieving more than a 3.5x increase in performance. ----------------------------------------------------------------------------- Of course an extrem case, but i have already made some experience with mmx fill stuff, doing up to four directions in parallel... > > > >>One other example i have in mind with my future 64-bit approach is using pairs >>of bitboards to generate white as well as black sliding attacks at once with >>128-bit xmm registers. > > > >I think there is even less use for attack tables for both sides than for attack >tables of the side to move. Ok, with pure pseudo legal move generation for sure. But for "more accurate" pruning/reduction/extension decisions, lazy eval decision, eval stuff, stalemate detection at the leaves, and SEE-like move sorting, etc.? I will give it a try, all attacks, all sliders including king as metaqueen, disjoint directionwise for each sliding piece (piecekind), and combined piecekind- and directionwise, pinned pieces, remove-checker, several taboo bitboards, hanging, en prise, check targets, legale, direction wise move targets[ply] for movegen bookholding... in about 300-500 cycles (And probably even faster with future cpu's, e.g. if xmm alus became 128bit wide as already mentioned in amd optimization guide). Ok, for lot of lazy eval cutoffs it's still too expensive. Otherwise it may help to avoid "wrong" cutoffs and smells some forced mate threat or other tactical stuff, not to mention hiding some prefetched hash read. What is more important? Maybe i learn that all this bitboard stuff is evil ;-) Cheers, Gerd > > > > > Christophe
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.