Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: High performance move generation

Author: Bas Hamstra

Date: 05:59:11 11/20/99

Go up one level in this thread


On November 20, 1999 at 04:55:06, Dan Newman wrote:

>On November 19, 1999 at 22:48:42, Ratko V Tomic wrote:
>
>>> When a piece moves, all squares that are now "pseudo-unattacked" by the
>>> move I undo the attacks, by XORing them.
>>
>>Does the bitmask refer to the actual attacks or to the potential uncovered
>>attacks? If it is potential attacks, they would still have to be decoupled into
>>real and potential before actual move list is produced. If they are real
>>attacks, then when a piece moves from a square, all sliding attackers to that
>>square would have to be identified (I guess from the bitmask via a LeastBit[]
>>array getting back the index of the least bit set) and their movement xored into
>>the right squares. I am not sure that this would perform much (if at all) better
>>than the regular adding/stepping generator. Namely setting a bit is more
>>expensive than writing a whole byte or a whole word to memory since setting a
>>bit involves reading a data word from memory (implicitly by the processor) then
>>writing the changed data to memory (even though the asm instruction mnemonics
>>may appear visually to only write to the memory). Writing a byte or a word
>>involves only writing, hence it uses the memory bandwidth (which is the main
>>bottleneck on the current micros due to CPU running several times faster than
>>the memory bus) much better.
>>
>>What I was thinking, regarding the incremental updates, would be a linked list
>>(probably a doubly linked list to be able to insert & delete quickly at random
>>point in the list), attached to each square containing a piece, and containing
>>linked all attackers to that square  (by linking either to the source square or
>>to the attacker in the piece list, and continuing through the piece list). The
>>attack list would be split into 2 sublists, as friendly attackers/defenders (of
>>the same color) and the hostile attackers (of the opposite color). Since a
>>square can be simultaneously attacked/defended from 8 sliding and 8 jumping
>>sources, one would need space for 16 lists going from each square. If one keeps
>>separate attack/defender sublists, each square would have 32 pointers (or 128
>>bytes for 4 byte pointers), so the whole board would take 8K. One would also
>>have a counter board which would give for each square count of pointers active
>>from that square at any given time (alternatively a 32-bit bitmask could be
>>added to a square to indicate which of the 32 link fields are active, in that
>>case each of the 16 pairs of pointers would corrspond to the predefined 8
>>sliding directions and 8 jumping outposts attacking this square).
>>
>>On the piece list (where the square pointers are pointing into) queen, knight
>>and king would have 8 link fields (each field with Next & Previous pointer)
>>since they can attack 8 pieces at once, thus they could be members of 8
>>different attack lists, while rook and bishop would have 4 link fields and pawns
>>only 2 link fields.
>>
>>Thus, looking for the attackers to a given piece would involve following the
>>list starting at the square of that piece, each pointer dereference brings in
>>the next attacker, without any scans of empty squares. When a piece moves, one
>>would start at the source square and follow the chain of attackers and update
>>them, as appropriately, to either attack whatever was on the same diagonal or
>>line or to take them out of the attack list if the're not sliding pieces or are
>>now hitting the edge of the board. A new list would be built at the destination
>>square, using a scan looking for attackers onto that previously empty square. If
>>the destination was occupied and a capture occured, the new list would inherit
>>the old links, with merely reversed head pointers for the attack & the defense
>>sublists. So this method has the best performance when pursuing the sequence of
>>captures, since the destination square lists are already there.
>>
>>I am curious if anyone had tried something along these lines, as sketched above,
>>and what kind of performance were they able to get out of it?
>
>I tried something similar to this (and I plan on trying it again sometime).
>It did give me the fastest capture generator that I've ever written,
>but I ended up with the slowest make/undo move functions (50% cpu time).
>There was an enormous amount of pointer copying there...  Every time a piece
>moves it ends up attacking different pieces, changing what other pieces
>attack, etc.  In the end it was pretty much break even.  I think I probably
>got about 120 knps on a P6/200.  My latest (bitboard) code goes at about 180
>knps.  (In both cases this is with little eval.)  OTOH, all those pointers
>could be used in calculating certain eval terms more easily I suppose...
>
>I've also tried the pseudo-attack table like Bas has described, or rather
>I started writing such a program but somehow got sidetracked...  (I think
>in fact I started that one after Bas described this idea here in CCC
>recently.  I tried this once before, but with actual attacks instead of
>pseudo-attacks--that was a nightmare; I got nowhere near finishing it.)  I
>suspect that it will also hit the make/undo code pretty heavily.  Updating
>all those 32-bit words for each square that is pseudo attacked is likely
>to be very expensive.  Moving the queen around is especially expensive...
>OTOH, in-check and attack detection are unbelievably cheap, and doing a
>SEE should be fairly cheap too.  (This discussion makes me want to try
>it again.)
>
>-Dan.

Hi Dan,

Let me give you an idea of my current speed: it is comparable with yours, the
same order of magnitude, I think (also Celeron 450). Indeed make/unmake slow
down considerably, other things speed up, like you mentioned. I don't think it
is much faster or slower on the whole than the more conventional approaches.
Another interesting figure: with only material eval my profiler says my make is
7% as wel as the unmake. NextCapture() is 10%. Blocks() is 3%.

The worst case is when a queen captures a queen, of course. But that doesn't
happen too often. When you just move a pawn it costs only a few ticks and you
have your (peudo) captures at hand. I don't think I have squeezed all speed out
of it, yet. For instance, in case of Q1xQ2 I remove attacks for the first queen,
than for the second, and insert new attacks for the first queen. Clearly that is
not optimal: better (I hope) would be to make a combined NOT mask for Q1 and Q2
and just NOT all 64 squares in one run.

Another thing to consider that I have never tried is to make a copy of the
attackboard so saving the undoing of attacks. No idea if that's faster.

The fastest purely material (fully SEE sorted) program I wrote yet was a
bitboard approach. But then when I added some eval and some InCheck checking and
attack detection the difference became quite small.

I am looking for ways to exploit my datastructures more. One example is: if a
pawn is at seventh I can easily determine wat the oppo has to sacrifice to
prevent promotion, by a SEE. Works quite well. Any ideas of where to use more
SEE's in eval succesfully? I am currently fiddling with counting pseudoattacks,
and assign a bonus for that (ignoring blocks) but without much success so far.

Regards,
Bas Hamstra.















This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.