Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: What Would it take to Hardware-ize Crafty?

Author: Robert Hyatt

Date: 10:47:04 02/18/04

Go up one level in this thread


On February 17, 2004 at 16:56:46, Bob Durrett wrote:

>
>If one ignores the many anticipated objections to doing this research, then what
>would it take?
>
>I can see the following process:
>
>(1)  A list of blocks of code which could be "hardware-ized" is made and the
>list ordered in order of decreasing expected benefit.
>
>(2)  Then, the powers that be would decide which blocks to do that to.
>
>(3)  Then it would be done.  Testing and tuning would follow.
>
>(4)  Then a match would be set up with Kasparov.
>
>(5)  Etceteras.
>
>By "hardware-izing" I mean creating a chip which could input it's input data,
>process that data, and output that data all within the time of a few clock
>cycles of a modern microprocessor chip.
>
>QUESTIONS:
>
>Which block of Crafty code would be chosen first?  How much benefit to expect?
>
>Same questions for second block of code.
>
>Etceteras for the rest of the code until nothing is left.
>
>Bob D.


We talked about this a couple of years ago.  Fortunately, I have been around
long enough to understand the original Belle chess engine hardware design, and
how it was "cloned" into DT/DB (with a subtle change to make the chip smaller).

With that in mind, crafty really doesn't fit a hardware implementation.  IE much
of what I do would be _very_ difficult to do in hardware.  It is the right way
to do things when memory is handy, but not when trying to implement the thing
directly in hardware.  For example, SEE is not a nice hardware topic because it
is a variable-length process and hardware doesn't like that.  Hardware
implementations use the old Belle MVV/LVA approach to ordering captures instead,
which is ok, but not nearly as good as SEE.  I generate moves and stuff 'em into
an array.  Hardware doesn't fit that very well, and the belle approach is
generally used where you generate one move, and make it and go on, then later
you come back and generate another move, and make it and so forth.  Dark Thought
implemented the move generator this way, and I sometimes wish I had as it does
offer some efficiencies, but it also offers problems with the inability to use a
history heuristic, for example.

The things that would help Crafty would be special purpose hardware instructions
that do the following:

Make/Unmake moves.  This special purpose hardware would keep the bitboards
internally in dedicated special-purpose registers, and do the updating needed as
asked by the software when making/unmaking a move.  The software time for this
would then vanish.

generate captures/generate non-captures/ generate check-evasions.  It would be
nice to have an instruction that when executed, dumps a list of moves ready to
go, with no real overhead.  The move generator time would basically vanish.

A Hardware SEE would be nice, and is doable, but doing it in the middle of a
hardware search would be messy.  But a separate SEE instruction would simply
make the cost of that vanish as well.

The above add up to less than 50% of the total search time.  Making them
"vanish" would make crafty possibly 2x faster.  The last 50% is way tougher as
it is the evaluation code, and it consumes 50%+ of the total search time at
present.  It could obviously be moved into hardware, and much of it would fit
very well since it is bitmap pattern matching.  Some of it would be very
difficult (blocked pawn code since I look to see where all pawns can advance
safely to, and that is hard to do in a fixed number of clock cycles).  But if we
did this part in hardware, then we _might_ hit speeds 10x faster or close to it,
since all that is left is just the search overhead itself.

Ken's pre-1980 version of Belle did just this.  This is the version of Belle
that first played in 1978 and was phased out in 1980 when he did the "whole
deal" in hardware.  Doing this 1978 version produced a program that was 2x
faster than the closest competition, chess 4.7.  Chess 4.7 searched about 2600
nodes per second, 1978 belle hit 5000, but don't forget that the software part
was running on a _very_ slow minicomputer (a pdp 11/44).

Making Crafty go perhaps 10x faster would be nice.  It would take 2+ years to
get it all done and tested, probably.  By then hardware has caught up somewhat
and that 10x has dropped to 2x-3x.  That's the problem with hardware.  And it is
why I am not that excited about Hydra.

Some numbers:

in 1975 chess 4.0 showed up at the ACM event on a CDC 7600 (Cyber 176).  They
were searching 10x faster than anyone else.  IE they were doing 2600 NPS on that
box, while the rest of us were struggling to hit 100, untill you happened to
find an IBM 360/91 or something laying around where you might hit 250-300 nps.
10x proved to be a dominating advantage, as their early ACM history shows.

In 1978 Ken caught up and actually passed them by a factor of 2, but that factor
of 2 just made the games interesting, belle was not dominant.

in 1980 we had just moved to the cray and started at about 1K nodes per second,
but eventually passed chess 4.x's 2.6K.  But Belle showed up with a new hardware
design that was hitting 160K.  80x faster than chess 4.x and us.  And it was a
dominant box for the next three years.  Finally, by 1983 we were doing 20K on
the XMP-2 and that was enough to put us back in the game, as our search was more
efficient for lots of reasons such as no hash-move for ordering, no SEE in
Belle, no killer/history ordering.  Etc.  We steadily pulled ahead through 1986
where we were doing 200K+ and the next closest thing was Belle but it was a ply
behind us then with our better software search.

In 1987 along came chiptest/deep thought, and their speed improved by leaps and
bounds, quickly hitting 2-4M by the 1989 WCCC.  And again, with that speed
advantage, they dominated computer chess thru their last event in 1995, losing
only one game I recall, to Fritz in Hong Kong.

Hydra doesn't represent that remarkable a speed jump yet.  IE they claim 3M
nodes per second per board, but something like 60% search efficiency, so back
that off to 2M x 8 board which is 16M nodes per second.  I just ran on a machine
in CCT6 that was doing between 8 and 10M nodes per second, and I have tested on
an 8-way opteron that matches Hydra's speed pretty well, yet I don't have any of
the typical hardware shortcomings, such as their lack of hashing in the hardware
and so forth.

That is why I said that it doesn't impress me that much.  It's an interesting
project, and it produces some interesting speed, but it is not way out ahead of
what anyone could produce on a simple 8-way opteron, much less going to 16-way
and beyond.  IE had it come out 3-4 years ago, it might have been a dominating
force for a couple of years, but today it simply isn't.

Now if they put together a new version with faster FPGAs, or many more FPGAs,
then this might change.  But I am evaluating reality not vaporware.  Until I see
such a beast, I don't have to play it and won't worry about it.  Meanwhile, I
know what current 64 bit hardware can do for me, and the result is _not_
something to be taken lightly.  CCT6 showed that...



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.