Author: Gerd Isenberg
Date: 00:00:01 08/08/03
Go up one level in this thread
On August 07, 2003 at 23:30:36, Vincent Diepeveen wrote: >On August 07, 2003 at 17:32:42, Gerd Isenberg wrote: > >>On August 07, 2003 at 16:16:25, Sune Fischer wrote: >> >>>On August 07, 2003 at 15:40:33, Gerd Isenberg wrote: >>> >>>>On August 07, 2003 at 08:24:28, Sune Fischer wrote: >>>> >>>>>On August 07, 2003 at 08:15:08, Uri Blass wrote: >>>>> >>>>>>>Crafty is 64 bit prog, which means it's slow on 32 bit, even I have found that >>>>>>>doing a lookup is faster than shifting, I simply never do 1<<sq, I use a table >>>>>>>for that. >>>>>> >>>>>>I guess that it is only for 64 bits and if you have 32 bits number then it is >>>>>>better to do 1<<i when 0<=i<32 and not to use arrays. >>>>>> >>>>>>Correct? >>>>> >>>>>If you can do the shift in 1 clock, then you can't go any faster, but 64 bit >>>>>shifts are slow on old 32 bit chips so the table becomes faster. >>>>> >>>>>So for pure 64 bit you get fewer tables, faster and cleaner code. >>>>> >>>>>-S. >>>>> >>>> >>>>Hi Sune, >>>> >>>>Exactly! On the other hand, i believe that there is no need to use 64 bits >>>>everywhere, if 32 bits are enough. Using the standard six 32-bit register set is >>>>still fine with Opteron and one byte shorter opcode due to missing REX prefix. >>>> >>>>I don't know sizeof(int) in AMD64 compilers, still 4, or 8 per default. >>>>But of course there are explicite 32- or 64-bit types, signed as well as >>>>unsigned. >>>> >>>>I'm strained about what is the fastest 64-bitscan on opteron, specially if two >>>>scans should be done simultaneously e.g. to get a move from/to index: >>>> >>>>1. Matt Taylor's 64-bit mul with de Bruijn sequence. >>>>2. Folded 32-bit mul with Matt's super magic de Bruijn sequence. >>>>3. bsf, still vector path and 9 cycles. >>>> >>>>But i have to wait some time, until i can try it ;-( >>> >>>Well you're the expert, I just hope you post your findings here :) >>> >>>One thing I'm very interested in, is if floodfillers will be fast enough to >>>replace rotated. It would be nice if getting the bit wasn't needed, also to do >>>away with the incrementally updated occupied rotated boards. >>>What a "pure" code that would be :) >>> >> >>Not quite sure, Sune. >> >>So many promisting options with opteron including rotated ;-) >> >>What about this approach? >> >>Kogge-Stone propagators in MMX, generarors in sixteen 128-bit XMM, eg. >>simultaneously for default material: >>black:white rook1,rook2,queen as rook but white:black king as rook meta slider, >>black:white bishops,queen as bishop but white:black king as bishop meta slider. >> >>Two opposite direction parallel or interlaced. Pinned pieces or covered checker >>on the fly with some and/ors. Very easy unconditional stuff, 5 up to 8 or more >>independend MMX/SSE2-instructions in a row. There is only some const* source >>pointer (rsi) and target-structure/class pointer (rdi) for intermediate attack >>results for later eval and movegen/sorting SEE use. As well disjoint directions >>attacks and disjoint piece attacks. >> >>May be "en passant" and out of order some gp-register processing to keep the >>pipes really busy, some easy pawn or knight stuff in C. >> >>Of course there are several incarnations of this routine, eg. a more expensive, >>but general one for all cases of not usual material with more than a queen per >>side, more than two rooks, bishops or knights, more than one bishop on same >>colored squares, and even cheaper one e.g. for pawn/knight endings, where pins >>are not possible. One initial material dependent switch as the one and only >>condition here. >> >>Due to the amount of information, disjoint and aggregated output of these >>routines, a legal move generator based approach may outperform rotated, >>specially when double direct path sse2 instructions became single in the future. >>This routine is even a nice place to put a prefetch instruction before. >> >>Even rotated i consider fast as hell on opteron with 32KByte lookup or less. >> >>Gerd > >you can already calculate a minimum number of cycles you lose to get >in a NORMAL register 1 move and then add the store penalty to it. > Do you mean mov mmx/xmm <-> reg64? Yes movd are still vector path and one should be avoided. >Now at a K7 2.127Ghz i'm going at 73MLN nodes a second generating speed >after 1.e4,e5 2.d4,d5 > >most of that is overhead due to general arrays that work both for black and >white. > >So how many cycles is that realistically for DIEP at the opteron a node? > >How many for your kogge stone minimum. > >Which one will be faster? We will see. Kogge Stone has some potential to do things massively parallel. For several piece sets as well for several directions. For pure movegen i would of course use a complete other desing. But as we all know, that's not the main task. > >So why waste the effort to using bitboards? > Didn't we had this discussions before? It is well known that this "natural" board representation don't fits to your thinking patterns ;-) Regards, Gerd
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.