Author: Christophe Theron
Date: 11:23:37 04/10/04
Go up one level in this thread
On April 10, 2004 at 12:51:03, Gerd Isenberg wrote: >On April 10, 2004 at 09:43:09, Christophe Theron wrote: > >>On April 10, 2004 at 04:42:06, Gerd Isenberg wrote: >> ><snip> >>>During the "evolution" of my program from rotated to fill based, things changed >>>a bit. I probably do a lot for "nothing" - but i do it only once, and i do it >>>unconditionally and parallel with other tasks. Often with todays super pipelined >>>processors, performing two or four independent, unconditional instructions >>>chains doesn't matter so much as long as you have enough registers... >> >> >> >>I wouldn't be so sure... >> > >I mean such pipeline miracles, using both float/mmx/xxm alus (mul/add) and other >resources perfectly: > >----------------------------------------------------------------------------- >Software Optimization >Guide for AMD Athlon™ 64 and >AMD Opteron™ Processors > >Chapter 9 Optimizing with SIMD Instructions page 227/228 > >.... > >Multiplying four complex single-precision numbers only takes 17 cycles as >opposed to 14 cycles to multiply one complex single-precision number. The >floating-point pipes are kept busy by feeding new instructions into the >floating-point pipeline each cycle. In the arrangement above, 24 floating-point >operations are performed in 17 cycles, achieving more than a 3.5x increase in >performance. >----------------------------------------------------------------------------- > >Of course an extrem case, but i have already made some experience with mmx fill >stuff, doing up to four directions in parallel... > >> >> >> >>>One other example i have in mind with my future 64-bit approach is using pairs >>>of bitboards to generate white as well as black sliding attacks at once with >>>128-bit xmm registers. >> >> >> >>I think there is even less use for attack tables for both sides than for attack >>tables of the side to move. > >Ok, with pure pseudo legal move generation for sure. But for "more accurate" >pruning/reduction/extension decisions, lazy eval decision, eval stuff, stalemate >detection at the leaves, and SEE-like move sorting, etc.? > >I will give it a try, all attacks, all sliders including king as metaqueen, >disjoint directionwise for each sliding piece (piecekind), and combined >piecekind- and directionwise, pinned pieces, remove-checker, several taboo >bitboards, hanging, en prise, check targets, legale, direction wise move >targets[ply] for movegen bookholding... in about 300-500 cycles (And probably >even faster with future cpu's, e.g. if xmm alus became 128bit wide as already >mentioned in amd optimization guide). > >Ok, for lot of lazy eval cutoffs it's still too expensive. >Otherwise it may help to avoid "wrong" cutoffs and smells some forced mate >threat or other tactical stuff, not to mention hiding some prefetched hash read. >What is more important? > >Maybe i learn that all this bitboard stuff is evil ;-) > >Cheers, >Gerd I would suggest to first design your selective algorithms and then optimize them. Using bitboards just because they can do great things is not, IMO, a good enough reason. Christophe
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.