Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: HW based Crafty (Boule's thesis)

Author: Keith Evans

Date: 01:48:01 04/08/02

Go up one level in this thread


On April 08, 2002 at 01:02:52, Robert Hyatt wrote:

>I have no idea either.  It seems high.  Cray Blitz executed roughly 7K
>instructions per node.  We had good hardware performance counters to get
>that number very exact.  I suspect that Crafty is in the same ballpark,
>roughly.  I just did a quick test and got 250K nps on a 750mhz PIII...

Here's a back of the envelope calculation that I would be interested in passing
by you...

Let's say for the sake of argument that Crafty's NPS will scale linearly with
clock speed, so on a 2 GHz x86 32-bit CPU it will reach 666 knps. I'll guess
that Crafty's branching factor is about 3, and we'll build an almost Crafty
clone that is not able to use all of the searching tricks and it's branching
factor is about 6.

So if we're going to use our hardware almost clone to search 3 plies deep it has
to get about 4.3Mnps to match the performance of the software Crafty. And if we
count on searching 4 plies then we'll need to roughly double that number.

It doesn't make an FPGA design sound all that promising when dealing with that
class of CPU. Much more interesting for something like a PDA add-on. (Given a
PDA with a large battery.)

Did I miss anything?

I have no idea how to factor in an improvement in evaluation. Based on your
numbers you are spending roughly 3000 cycles per node. If you were to increase
that number by say a factor of ten, then once the hardware breaks the half a
million to million nps mark then it's interesting - we might even let it go six
plies deep. So it seems that any efforts in this area will involve a lot of
experimentation with evaluation. Did Hsu exagerate his 40k number, and/or will
diminishing returns kick in?

I was told that Brutus only searched 3 plies deep in hardware, which makes sense
to me given the limited amount of evaluation that could be placed on a Virtex
405E. Depending on his chess program and CPU I would guess that it would be
break even at 3 plies hardware search at somewhere around 2-4Mnps.

>Hard to say.  Crafty generates moves in two passes... captures then
>non-captures.  They are not sorted in any way other than (in general) to
>generate moves that advance toward the opponent's side of the board first,
>before generating retreats...

Hmm... hardware could easily favor searching moves which tend to advance first
too. This might be a nice small experiment for somebody to perform. They would
have to change the arbiter - I don't think that you would want to make this
programmable.

Marc put in a mode in his movegen arbiter to generate either MVV/LVA or MVV/MVA
and claims to prefer MVV/MVA at times. (Arbiter needs this function anyway as a
function of STM.) Like I said it's programmable, so if it explodes...

>>
>>If movegen is 10% of the CPU and eval is 50%, then where's the other 40%? What's
>>the next largest consumer of cycles?
>
>InCheck(), Search(), Swap() and NextMove(), which are probably pretty
>even in cpu usage...

Interesting - I would have thought that InCheck() would have been part of making
moves. And NextMove() and Swap() sound a little suspicious too. I don't have the
Crafty source available at the moment.

Regards,
Keith



This page took 0.01 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.