Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: HW based Crafty (Boule's thesis)

Author: Keith Evans

Date: 20:58:40 04/08/02

Go up one level in this thread


On April 08, 2002 at 23:11:02, Robert Hyatt wrote:

>On April 08, 2002 at 22:35:00, Keith Evans wrote:
>
>>On April 08, 2002 at 12:11:53, Robert Hyatt wrote:
>>
>>>On April 08, 2002 at 04:48:01, Keith Evans wrote:
>>>
>>>I tend to agree...  doing a full "engine" in an FPGA would be a "task".  The
>>>thing that seemed most interesting to me was to take the bitmap stuff and tuck
>>>it all into an FPGA so that we get special-purpose "instructions" like
>>>"makemove", "unmakemove", "generate moves" and perhaps even a Swap() function
>>>and InCheck function.  That would certainly run the search speed up by a
>>>factor of at least two, which would be very significant.  And it wouldn't
>>>give up anything at all since the actual search, hashing, etc would be done
>>>in software, but the bitmap/hash-signature update stuff would be done in
>>>hardware.
>>>
>>
>>I don't really have a good feel for what would be involved in this, but it seems
>>to me that if you are considering the use of an FPGA then we need to take a
>>radically different approach to achieve a performance gain over these 2 GHz CPUs
>>- the routing delays in FPGAs are huge compared to 1/2 ns.
>
>Yes... but don't forget the memory cost is not .5 ns but in the 100ns range,
>so that is pretty large...

I'm not sure how much data would need to be transferred to a hypothetical
bitboard coprocessor. My concern would be that if you were to put this on a PCI
device then each access would take at least 120 ns if you're not bursting and
you're on a 32-bit 33 MHz PCI bus. This is a context switch for my brain and I'm
totally ignorant of how Crafty works here - I have no idea how the bus cycles
could/would look. At least "generate moves" should burst. I don't think that you
would want a descriptor based DMA engine - just let the CPU pull data.

This would be a lot easier project to take on than a full engine. The promise of
an engine that could deliver greater than a 2X increase is exciting, but perhaps
too difficult a place to begin. If something like your bitboard accelerator
makes sense on paper, then it might be a lot better way for Slater to come up to
speed, before deciding if he wants to pursue it to the extreme.

>>
>>I know that your bitmap approach uses a lot of lookup tables, but I'm not sure
>>that those could be put on an FPGA or even that if that would have to be done on
>>the sort of chip that you're talking about. If they would be required - then can
>>you tell me how large they are?
>
>
>For bishops (assuming the "compact_attacks" version) I need 2 x 296 x 8 bytes.
>For normal attacks, it is 2 x 64 x 256 x 8 bytes...  I normally use the
>compact version on intel machines which greatly reduces memory traffic and
>cache thrashing.
>

I went and fetched my Virtex datasheet. I like to use the Virtex 1000E part for
this kind of thing because I know that it's readily available, and there are
some interesting proto cards that use it. Depending upon how something like this
develops it might be more appropriate to choose an extended RAM part.

It has 393,216 bits of BlockRAM. This consists of 96 RAMs which are each
independently organized as one of {4096x1,2048x2,...,256x16} Assuming that the 8
bytes is referring to a 64-bit wide memory organization and that the 2 is
referring to two separate ROMs you could build two 512x64 RAMs using the 512x8
configuration which would eat up 16 Block Select RAMs. These RAMs are organized
into columns and this would eat up one column out of 6. It's a little wasteful
due to your 296 number and the lower limit on the RAM depth configuation.

So how many more of these tables are there...?

One nice thing about a bitboard accelerator is that it could probably be
pipelined and clocked reasonably fast. One problem is that the placement of the
RAMs is fixed - so routing delays may be large depending on how the tables are
used.

Regards,
Keith



This page took 0.01 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.