Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: HW based Crafty (Boule's thesis)

Author: Keith Evans

Date: 10:03:26 04/01/02

Go up one level in this thread


On April 01, 2002 at 09:46:39, Robert Hyatt wrote:

>On April 01, 2002 at 04:20:57, Tom Kerrigan wrote:
>
>>On March 31, 2002 at 16:29:10, Keith Evans wrote:
>>
>>>generator for his program MBChess. His thesis is almost complete.  Marc got his
>>>program from about 2M nps to 10M nps using an FPGA for move ordering and movegen
>>>only."
>>>
>>>Does he really say this? It sounds too good to be true. Let's take a look at
>>>this from the perspective of the PCI bus. To be able to generate 10M moves per
>>>second on a 33 MHz PCI bus, Marc seems to be implying that he can complete a PCI
>>>read in 3.3 cycles and that the master will completely saturate the PCI bus with
>>
>>Right, and presumably much more data needs to be transferred than just reading
>>moves. (I'd like to read the paper but I'm on dial-up right now.) Also, this
>>assumes the logic can generate moves that fast. Seems like the critical paths
>>for move generation would be pretty darn long on an FPGA; I'd be surprised if it
>>could run at ~33MHz. (Not that it would necessarily have to, but I do think
>>clock speed is a likely bottleneck.) Of course, my biggest problem with the 2M
>>-> 10M NPS jump is that MBChess must be spending more than 80% of its CPU time
>>generating and ordering moves, which is way beyond realistic, IMO.

I'm assuming that he got it to run at 33 MHz. He thought about how it would be
mapped into an FPGA and was careful not to do something that would cause routing
congestion. The interesting question to me is that given it runs at the 33 MHz
PCI clock rate and implements the most efficient PCI transfers, then what system
level performance can be expected out of it, and how much acceleration could it
possibly produce for a top chess program running on a 1GHz+ x86 class processor.
(Basically what if a graduate student reimplemented ChipTest as a full custom
0.18 micron chip and made the changes/improvements proposed in the thesis.)

My belief is that it wouldn't be able to get beyond 5-6 million/moves a second
in a system, and that it would not accelerate Crafty. My understanding is that
after a move is generated and returned by the FPGA that Crafty would still need
to generate hashes, update material counts, handle repetition detection, handle
50 move counter, update some stacks,... before a move from this FPGA would be
equivalent to the moves that are currently benchmarked in Crafty.

Also Crafty would have to use the move ordering given by the FPGA and wouldn't
be able to use some of the tricks of the trade. Maybe I'm wrong and there's a
way to instruct the FPGA to make a particular move imposed by software, but this
is not mentioned in the thesis. If I'm right then we would need to take the
increased tree size into account to compare performance - just a raw NPS would
not be meaningful.

>>
>
>
>It depends on what he is going to do.  IE if he tries to do a full
>Belle-implementation, then the PCI bus speed is irrelevant.  Because all
>the move generation, and search stuff is done _inside_ the FPGA hardware,
>and all that needs to run over the PCI bus is the _result_ of the search.
>
>That was how both Belle and Deep Thought/Deep Blue worked...  DT/DB used the
>VME bus...  Belle used a very slow connection from a PDP 11 to the special-
>purpose chess hardware...
>
>A high bandwidth is not necessarily needed, depending on what is implemented.
>

If you read the thesis you'll see that it's just a move generator. As far as I
can tell, you need to perform a minimum of 1 PCI transfer per move. (You can do
both unmake the last move and return the next more in one read transfer.)

You can read his white paper at: www.macs.ece.mcgill.ca/~mboul

It prompted me to reread some articles on both Belle and Deep Thought. It seems
to me that if Ken Thompson and Joe H. Condon had been motivated enough then they
could have removed some inefficiencies from Belle and they would have been
playing at a higher level than Deep Thought. Their evaluation was far more
extensive. Did they simply get tired of computer chess?

I just noticed a discrepancy between how Hsu and Thompson/Condon describe the
Belle move generator. Thompson/Condon claim that their disable stack is 1-bit
wide by 64 deep at each square, and Hsu claims that it is 2-bits wide by 64
deep. I would think that you would need the 2-bits to maintain both the victim
and aggressor state. Of course this method of storing state is inefficient as
Hsu found, and you basically just need the last move to regenerate it - but if
you don't store it at each square, then you require it to be loaded somehow
which would presumably take time or a decoder and a lot of wires.

Here is what Hsu says in IEEE micro:

   "In the earlier Belle design, a 2-bit disable-stack masks off
    squares already searched. One disable-stack bit masks off
    attacker squares already searched for a given victim.
    The other bit masks off fully searched victims.

    ...

    In the Deep Thought and Deep Blue chess move generators, the
    last move searched from the position is used to compute the
    two-bit disable-mask."

Has anyone read any detailed description of Belle _beyond_ what's given
in:

J.H. Condon and K. Thompson. Belle chess hardware. In Advances in
Computer Chess 3, pages 45{54. Pergamon Press, Oxford, England, 1982.

J.H. Gondon and Ken Thompson Belle, Chess
Skill in Man and Machine, 2nd. ed., P.W. Frey,
ed., Springer-Verlag, 1983, pp. 201–210.

I have those at home, but can't quote from them at the moment. Hsu seems to know
more about Belle than what's written in those documents. Perhaps this is based
on conversations that he had with Thompson/Condon.

Regards,
Keith



This page took 0.03 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.