Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Why is Crafty so fast?

Author: Chris Whittington

Date: 04:45:32 11/28/97

Go up one level in this thread



On November 27, 1997 at 22:30:40, Robert Hyatt wrote:

>On November 27, 1997 at 10:30:29, Chris Whittington wrote:
>>
>>
>>So, hacking your numbers around, you appear to suggest:
>>
>>1. 32-bit P6 to 64 bit alpha architecture change gives Crafty about 80%
>>
>
>I'm not sure.  All I can say for sure is Bruce gets 1.75X on the 533
>machine, I get 3.1X on the 500 machine.  If we assume that 1.75 is pure
>machine cycle time, ignoring 32 vs 64 bits, then the rest is likely a
>result of simply using 64 bit words effectively.
>
>
>>2. With some care and optimisation, you reckon you could get this to,
>>say 125%
>
>I believe so, yes.  IE I know of a faster PopCnt() function for the
>alpha, for example.  And as this turns into a hardware operation (which
>is rumored for the 21264) there is a lot to gain, yet...

Except I was trying to get at the 64 bitmap vs offset performance. The
alpha just happening to have in the future a neat bit counter isn't a
function of 64 bitmap approach, but of alpha vs PC, which is another
issue. Presumably an offset program would get the same improvement on an
alpha (and CSTal is full of bit counting).

>
>
>>
>>But this doesn't imply 64 bitmap approach for chess is approx 100%
>>faster than 32 bit for chess, does it ? Because the 32 bit Crafty code
>>is an emulation of 64 bitness bitmaps, not code that is written in the
>>'previous' style, of say having a byte or a word or whatever per board
>>square.
>
>
>I personally think that algorithm for algorithm, offset and bitmap are
>equal.

Interesting. This may be. But lets take a knowledge/speed extreme
example for mobility (i'll assume that the calculations aren't done
incrementally)

1. We go for speed. Bit map approach is neat, you just do some count
bits. Offset is slower, we need to trawl through all the squares,
looking for attacks.

2. We go for intensive knowledge. We say mobility is some function of
SAFE movement capability into each square. Now, bitmap or no bitmap, we
have to process each square for safety. Basically a swap-off or exchange
evaluation for each piece movement into each square. Isn't offset
faster, because we already have the attack/defence bits ranked on each
square, smallest attacker value first ?

Isn't it the case that bitmaps are neat if we just want to quickly
quantify attacks (basically just add them up) ? But as soon as we want
to make qualitative judgements about each attack and how the
attacks/defence interact with each other, then the offset data seems
more accessable ?


>I have to pump 2 32bit words thru the P6 for everything I do,
>but I get some efficiencies that make up for this.  But on the alpha,
>you
>are *forced* to pump 64bit quantities around, and if you are forced to
>do
>this, it obviously pays off to use the full 64bits if possible, to avoid
>pumping values around that have double the number of bits you really
>need.

If there is some advantage, but not just for the sake of it. You woudl
be unlikely to argue the case of 128 bits over 64 bits on this basis,
would you ? I thought the big deal of 64 bits was that it equalled the
64 squares on the chess board.

>
>I have this vague feeling that bitmaps are going to be superior on 64bit
>machines.  Based on the problem of pumping/using 64bits, vs pumping 64
>but
>using only 32, which means 1/2 of the data throughput is wasted,
>totally.

As above, why not argue for 128 over 64. Again 64 is the magic number,
no ?

>
>
>>
>>The next interesting question, and one that is again semi-impossible to
>>answer, is just how much better is it (as measured by nps) to be using
>>bitmaps than not. Looks like the upper bound on 'better' is something
>>less than your 125%. One wonders how much less .... ?
>
>The question is "how many instructions per node?" I suspect.  And based
>on
>my analysis, this answer is potentially lower for bitmaps.  IE, for a
>simple
>example, what is involved in your asking "is the pawn outside the square
>of
>the king?"  I do it in one memory load for a mask plus an AND operation
>and
>a test.  IE it takes so little time I can't measure it.

I think you can find many examples of quick and easyness. But my case is
that intensive knowledge processing is probably better with offsets.

And this argument then gets taken further. If you select or promote the
bitmap paradigm, you are already promoting and selecting a design.
Namely that of fast searcher with less qualitative knowledge ........

> Many things
>turn out
>to be as efficient, on a 64 bit machine.  On a 32 bit machine, I take a
>hit
>because everything is done twice.

For Crafty and other 64 bitmap programs, yes, this is true, you take a
hit by trying to emulate 64 bitness on a 32 bit machine. But isn't it
ever so slightly rich to extol the virtues of 64 bit on this basis ?

>  This is not bad on a superscalar
>machine
>like the P5/P6, because the inefficient double stuff simply serves to
>feed
>multiple pipes in parallel.  So I keep the pipes better fed than most
>programs,
>because of this.  But it isn't optimal.  On the 64 bit machines, I lose
>the
>duplicate operations, which gives me an almost free 2x performance
>boost.  But
>not using them isn't a kiss of death.  But it might end up being a
>disadvantage
>until someone finds a new approach that is non-bitmap based, but which
>needs
>to pump 64 bit values around.
>
>
>>
>>Also of consequence is that these figures are dependant on working to
>>the Crafty-design paradigm. Loosely, I seem to recollect that you (or
>>was it a Bruce description of Ferret?) create piece movement bitmaps,
>>and for sliding pieces, these bitmaps don't account for blocking pieces.
>>Like, for example, if you want to know if a piece is attacked, you pick
>>up the enemy movement bitmaps, test for a hit on the square in question,
>>and, if its hit by a sliding piece, THEN test for a blocking piece. This
>>seems fine, good and fast, if your program is not needing to do many
>>piece attack queries; becuase if it is, there comes the point where it
>>is probably better to preprocess all of these beforehand, and then not
>>have to waste time over and over in doing block piece tests ........ ?
>
>first, the "blocking" test doesn't apply.  I take a diagonal's contents,
>index into an array, and get a bitmap showing what squares this "piece"
>attacks on this diagonal.  One memory reference and I am totally done.

When do you do this ? Beforehand for all square attacks ? Or as you need
to know ?

Again knowledge intensity would require it done beforehand. Fast and
easy would allow it to be done on the fly.

>If
>I am interested in "battery" conditions and so forth, it is not hard.
>IE
>I haven't found anything I wanted to do

Well this is the key, no ? For the Crafty paradigm it works. For the
knowledge paradigm, I think it gives problems. Your way promotes a
certain sort of chess engine .....

> that wasn't just as easy (or
>even
>easier) than what I did in Cray Blitz, which was *not* bitmap based at
>all.  If I used each piece's attacks repeatedly, I'd likely avoid all
>the
>duplicate work, as a standard memory-reference optimization, because all
>of
>our machines are memory-starved and are getting worse every year...
>
>However, the bottom line is I don't think there are any things you can
>do
>that I can't, nor any things I can do that you can't.

Its software, so we all can do anything :)

But the question is how effectively, and does one way actually steer the
designer down specific pathways ?

>The ideas are
>somewhat
>complimentary.  In fact, I have a non-bitmap board representation in
>crafty
>(board[64]) that I use for some specific things that are easier to do
>than
>trying to do them with bitmaps...
>
>>Or, what I'm trying to say is that the Crafty results probably don't
>>cover highly intensive evaluation functions ..... ?
>
>I can't answer.  IE I do mobility at zero cost.  I do the pawn square
>thing
>at zero cost.  It takes some thought to get into bitmaps, but I don't
>see
>anything I did before that I can't do now, nor do I see anything that I
>do
>now that I couldn't have done before.  If performance is the issue, I
>suspect
>bitmaps on 64 bit machines are going to be very hard to beat.  But if
>you are
>interested only in knowledge, I don't see why they wouldn't work just as
>well
>as the traditional offset board representation, and they still should
>give you
>a performance "kick"...

One concern (repeated here) is that the bitmap doesn't inherently store
the attacker significance. So if we want to find the smallest attacker
we have to hunt around the bits first, and so for the next smallest etc.
For qualitative work on attack/defence of a square this is going to be a
drawback.

> But notice that it takes a lot of time to get
>"into"
>bitmaps and start thinking bitmaps.  And after that point is reached,
>they are
>no more obtuse or difficult to use than any other programming paradigm.
>And
>using macros can produce some pretty readable code and hide all the
>ands/ors/
>etc...

Agreed. Switching paradigms gets to be difficult. Made worse by the
amount of time investment in the previous way of doing things; and by
the age of the person involved. If you're young and you haven't spent
all your time doing it one way, its a lot easier to make the switch.

But I still make the point; the design compromise made to be bitmapped
ot offseted is going to have major consequences for the knowledge/speed
decision.

Chris Whittington




This page took 0.01 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.