Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: HW based Crafty (Boule's thesis)

Author: Robert Hyatt

Date: 20:11:02 04/08/02

Go up one level in this thread


On April 08, 2002 at 22:35:00, Keith Evans wrote:

>On April 08, 2002 at 12:11:53, Robert Hyatt wrote:
>
>>On April 08, 2002 at 04:48:01, Keith Evans wrote:
>>
>>>On April 08, 2002 at 01:02:52, Robert Hyatt wrote:
>>>
>>>>I have no idea either.  It seems high.  Cray Blitz executed roughly 7K
>>>>instructions per node.  We had good hardware performance counters to get
>>>>that number very exact.  I suspect that Crafty is in the same ballpark,
>>>>roughly.  I just did a quick test and got 250K nps on a 750mhz PIII...
>>>
>>>Here's a back of the envelope calculation that I would be interested in passing
>>>by you...
>>>
>>>Let's say for the sake of argument that Crafty's NPS will scale linearly with
>>>clock speed, so on a 2 GHz x86 32-bit CPU it will reach 666 knps. I'll guess
>>>that Crafty's branching factor is about 3, and we'll build an almost Crafty
>>>clone that is not able to use all of the searching tricks and it's branching
>>>factor is about 6.
>>>
>>>So if we're going to use our hardware almost clone to search 3 plies deep it has
>>>to get about 4.3Mnps to match the performance of the software Crafty. And if we
>>>count on searching 4 plies then we'll need to roughly double that number.
>>>
>>>It doesn't make an FPGA design sound all that promising when dealing with that
>>>class of CPU. Much more interesting for something like a PDA add-on. (Given a
>>>PDA with a large battery.)
>>>
>>>Did I miss anything?
>>
>>
>>I tend to agree...  doing a full "engine" in an FPGA would be a "task".  The
>>thing that seemed most interesting to me was to take the bitmap stuff and tuck
>>it all into an FPGA so that we get special-purpose "instructions" like
>>"makemove", "unmakemove", "generate moves" and perhaps even a Swap() function
>>and InCheck function.  That would certainly run the search speed up by a
>>factor of at least two, which would be very significant.  And it wouldn't
>>give up anything at all since the actual search, hashing, etc would be done
>>in software, but the bitmap/hash-signature update stuff would be done in
>>hardware.
>>
>
>I don't really have a good feel for what would be involved in this, but it seems
>to me that if you are considering the use of an FPGA then we need to take a
>radically different approach to achieve a performance gain over these 2 GHz CPUs
>- the routing delays in FPGAs are huge compared to 1/2 ns.

Yes... but don't forget the memory cost is not .5 ns but in the 100ns range,
so that is pretty large...


>
>I know that your bitmap approach uses a lot of lookup tables, but I'm not sure
>that those could be put on an FPGA or even that if that would have to be done on
>the sort of chip that you're talking about. If they would be required - then can
>you tell me how large they are?


For bishops (assuming the "compact_attacks" version) I need 2 x 296 x 8 bytes.
For normal attacks, it is 2 x 64 x 256 x 8 bytes...  I normally use the
compact version on intel machines which greatly reduces memory traffic and
cache thrashing.




>
>On a totally different thread I believe that the XCV1000E class parts contain
>enough RAM to put a 2kx64 bit hash table on them which would be single cycle
>r/w, and even dual port if there were a reason. This might help reduce the BF
>for shallow 3-4 ply searches.
>
>Xilinx even has some app notes on how to implement CAMs which might be useful
>for repetition detection. The basic building block is a 16x8 CAM which means you
>would need a fair number, but I think that it would fit.
>
>Regards,
>Keith



This page took 0.01 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.