Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Move generation question for the big boys

Author: Robert Hyatt

Date: 17:34:36 09/15/01

Go up one level in this thread


On September 15, 2001 at 14:30:40, Vincent Diepeveen wrote:

>On September 15, 2001 at 11:14:36, Sune Fischer wrote:
>
>>Hi
>>
>>When I discovered that Crafty had some neat assembler rutines, I decided to test
>>if they where faster than my humble rutines.
>>I have up until now used a "raytracing" or boardtracing rutine to find the legal
>>moves for the sliding pieces. And for the knights, pawns and king I simply check
>>if the board has an allied piece on that square.
>>It's a simple way of generating the legal moves.
>>I still generate the attack bitboards to get the legal king moves though.
>>
>>I then tried to find all the moves directly from the bitboards by using one of
>>Crafty's assember functions.
>>This part of my program is only a minor part, but none the less it runs a full 8
>>percent slower than my previous raytracing algorithm!!
>>
>>This is what I have now:
>>
>>void FindMovesQueen(..loads of pointers...)
>>{
>> int to_square;
>> BITBOARD bb=(~allied.occupied) & allied.attack[board.id[from_square]];
>>
>> while (bb)
>> {
>>   to_square=63-FirstOne(bb);    // get a bit
>
>the slow part is of course the 'firstone' function, though
>it appears there is a trick on the K7 to speed it up quite a
>bit in assembly. Nevertheless all the 64 bits overhead in total
>slow down the thing.


FirstOne() isn't particularly slow, using the bit-scan instructions that
are very fast on PII and beyond.  It certainly isn't much of a cost in the
profile runs I do.



>
>>   bb ^= mask[to_square];        // remove the bit
>>
>>   movelist[++counter].from=from_square;
>>   movelist[counter].to=to_square;
>>   movelist[counter].piece=QUEEN;
>>   movelist[counter].capturedpiece=enemy.piece[board.id[to_square]];
>
>this looks ugly. Using a pointer here would be way faster.
>
>> }
>> return;
>>}
>>
>>(movelist, counter, from_square etc. are passed in the argument)
>>
>>I know why it is slow too.
>>First I have to form the bb, that's 2 bitboard operations.
>>Then I need to run an algorithm, FirstOne from crafty, to find the first bit.
>>Then I need to mask out that found bit.
>>Both of these run several time pending on the while loop.
>>Next is 4 lines I always have, no matter how I do it, so we can safely ignore
>>those.
>>The while loop is a conditional much like the if's I use when raytracing, so
>>probably the if's and while almost cancel out.
>>
>>All in all I have added a lot of operations to my program, I am not surprized it
>>is a lot slower. Given that the entire program suffer a slowdown by about 8.1
>>percent, I estimate that this technique is more than twice as slow as the
>>raytracing.
>>The upside of things is that I reduce my code by about 1000 lines or so, but is
>>this worth 8 percent in speed?
>
>Why not speed it up a factor of 2 if not more (depending upon what
>processor we talk about) and just get rid of all code and rewrite it
>till you have about a line or 50 in total, including EP and castling?
>
>that's always faster than 1000 lines of C++ code.
>
>The problem of too big code is that when testing it a few million time
>a second it might seem fast, but things change completely when they are
>integral part of the program. After some years of programming your program
>won't fit inside L1 cache anymore and then the suffering starts!
>
>Note that the small trace cache size of the P4 gives thoughts on thinking
>too here. With DIEP where the ENGINE part takes many hundreds of kilobytes
>of codesize from the L1+L2 caches, i definitely always go for the smallest
>code. Where smallest code is defined as:
>  min( (whitecode + blackcode) , (general code) )
>
>Sometimes writing things general is taking up more code than tot total
>of writing it for black & white out (certain eval patterns for example).
>
>In case of move generation that isn't the case in DIEP, but the bitboard
>code for example already has been written down out for every piece seperate.
>
>Bitboards are a big disaster here IMHO.
>
>>I know the use of 64 bit processors will give a nice boost to this method and I
>>probably can't expect anything like that for my tracing function, but frankly
>>I'm not sure it would be enough to catch up.
>>
>>Am I mistaken, are the big boys doing something else?
>>Is there an even faster way?
>>I need to know :D :D
>>
>>Thanks,
>>Sune



This page took 0.01 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.