Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Why is Crafty so fast?

Author: Robert Hyatt

Date: 19:30:40 11/27/97

Go up one level in this thread


On November 27, 1997 at 10:30:29, Chris Whittington wrote:
>
>
>So, hacking your numbers around, you appear to suggest:
>
>1. 32-bit P6 to 64 bit alpha architecture change gives Crafty about 80%
>

I'm not sure.  All I can say for sure is Bruce gets 1.75X on the 533
machine, I get 3.1X on the 500 machine.  If we assume that 1.75 is pure
machine cycle time, ignoring 32 vs 64 bits, then the rest is likely a
result of simply using 64 bit words effectively.


>2. With some care and optimisation, you reckon you could get this to,
>say 125%

I believe so, yes.  IE I know of a faster PopCnt() function for the
alpha, for example.  And as this turns into a hardware operation (which
is rumored for the 21264) there is a lot to gain, yet...


>
>But this doesn't imply 64 bitmap approach for chess is approx 100%
>faster than 32 bit for chess, does it ? Because the 32 bit Crafty code
>is an emulation of 64 bitness bitmaps, not code that is written in the
>'previous' style, of say having a byte or a word or whatever per board
>square.


I personally think that algorithm for algorithm, offset and bitmap are
equal.  I have to pump 2 32bit words thru the P6 for everything I do,
but I get some efficiencies that make up for this.  But on the alpha,
you
are *forced* to pump 64bit quantities around, and if you are forced to
do
this, it obviously pays off to use the full 64bits if possible, to avoid
pumping values around that have double the number of bits you really
need.

I have this vague feeling that bitmaps are going to be superior on 64bit
machines.  Based on the problem of pumping/using 64bits, vs pumping 64
but
using only 32, which means 1/2 of the data throughput is wasted,
totally.


>
>The next interesting question, and one that is again semi-impossible to
>answer, is just how much better is it (as measured by nps) to be using
>bitmaps than not. Looks like the upper bound on 'better' is something
>less than your 125%. One wonders how much less .... ?

The question is "how many instructions per node?" I suspect.  And based
on
my analysis, this answer is potentially lower for bitmaps.  IE, for a
simple
example, what is involved in your asking "is the pawn outside the square
of
the king?"  I do it in one memory load for a mask plus an AND operation
and
a test.  IE it takes so little time I can't measure it.  Many things
turn out
to be as efficient, on a 64 bit machine.  On a 32 bit machine, I take a
hit
because everything is done twice.  This is not bad on a superscalar
machine
like the P5/P6, because the inefficient double stuff simply serves to
feed
multiple pipes in parallel.  So I keep the pipes better fed than most
programs,
because of this.  But it isn't optimal.  On the 64 bit machines, I lose
the
duplicate operations, which gives me an almost free 2x performance
boost.  But
not using them isn't a kiss of death.  But it might end up being a
disadvantage
until someone finds a new approach that is non-bitmap based, but which
needs
to pump 64 bit values around.


>
>Also of consequence is that these figures are dependant on working to
>the Crafty-design paradigm. Loosely, I seem to recollect that you (or
>was it a Bruce description of Ferret?) create piece movement bitmaps,
>and for sliding pieces, these bitmaps don't account for blocking pieces.
>Like, for example, if you want to know if a piece is attacked, you pick
>up the enemy movement bitmaps, test for a hit on the square in question,
>and, if its hit by a sliding piece, THEN test for a blocking piece. This
>seems fine, good and fast, if your program is not needing to do many
>piece attack queries; becuase if it is, there comes the point where it
>is probably better to preprocess all of these beforehand, and then not
>have to waste time over and over in doing block piece tests ........ ?

first, the "blocking" test doesn't apply.  I take a diagonal's contents,
index into an array, and get a bitmap showing what squares this "piece"
attacks on this diagonal.  One memory reference and I am totally done.
If
I am interested in "battery" conditions and so forth, it is not hard.
IE
I haven't found anything I wanted to do that wasn't just as easy (or
even
easier) than what I did in Cray Blitz, which was *not* bitmap based at
all.  If I used each piece's attacks repeatedly, I'd likely avoid all
the
duplicate work, as a standard memory-reference optimization, because all
of
our machines are memory-starved and are getting worse every year...

However, the bottom line is I don't think there are any things you can
do
that I can't, nor any things I can do that you can't.  The ideas are
somewhat
complimentary.  In fact, I have a non-bitmap board representation in
crafty
(board[64]) that I use for some specific things that are easier to do
than
trying to do them with bitmaps...

>Or, what I'm trying to say is that the Crafty results probably don't
>cover highly intensive evaluation functions ..... ?

I can't answer.  IE I do mobility at zero cost.  I do the pawn square
thing
at zero cost.  It takes some thought to get into bitmaps, but I don't
see
anything I did before that I can't do now, nor do I see anything that I
do
now that I couldn't have done before.  If performance is the issue, I
suspect
bitmaps on 64 bit machines are going to be very hard to beat.  But if
you are
interested only in knowledge, I don't see why they wouldn't work just as
well
as the traditional offset board representation, and they still should
give you
a performance "kick"...  But notice that it takes a lot of time to get
"into"
bitmaps and start thinking bitmaps.  And after that point is reached,
they are
no more obtuse or difficult to use than any other programming paradigm.
And
using macros can produce some pretty readable code and hide all the
ands/ors/
etc...




This page took 0.02 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.