Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: HW based Crafty (Boule's thesis)

Author: Robert Hyatt

Date: 22:02:52 04/07/02

Go up one level in this thread


On April 07, 2002 at 12:50:36, Keith Evans wrote:

>On April 06, 2002 at 22:09:03, Robert Hyatt wrote:
>
>>On April 06, 2002 at 00:53:39, Tom Kerrigan wrote:
>>
>>>On April 05, 2002 at 14:46:59, Robert Hyatt wrote:
>>>>I think you pointed out the flaw yourself.  2000 instructions at 2ghz is not
>>>>_nearly_ enough to do a node.  And a 12mhz FPGA is a very slow FPGA.  100mhz
>>>>is more like it for SOTA...  I'll take on that 2ghz general-purpose CPU any
>>>>time you want...
>>>
>>>First, my own program would search more than 1M NPS on a 2GHz chip. Which means
>>>fewer than 2k cycles per node. Which means ~2k instructions per node, and
>>>possibly less. Which means that not only are 2k instructions "nearly" enough to
>>>do a node, they ARE enough to do a node.
>>
>>I believe I said a "real chess program"...  I don't know of any "real" engines
>>that search 2K instructions per node...  I'm also talking about _real_ nodes...
>>Just to be clear...
>
>Do you have any idea what the spread is? Or what this is for a recent Crafty on
>an x86? Hsu mentioned 40,000 instructions per node as typical for a high-end
>program in his IEEE Micro article, but didn't explain how he arrived at the
>number.

I have no idea either.  It seems high.  Cray Blitz executed roughly 7K
instructions per node.  We had good hardware performance counters to get
that number very exact.  I suspect that Crafty is in the same ballpark,
roughly.  I just did a quick test and got 250K nps on a 750mhz PIII...




>
>I've seen messages about Crafty which state that Evaluate() takes roughly 50% of
>the CPU time. Which means that if we were to duplicate Crafty's eval in hardware
>which could execute in zero time then we would at most double the performance of
>Crafty. Now I take it that Hsu's eval is more complex than Crafty's so if we
>were to replace the Crafty eval with something more complex then it starts to
>get interesting.


Both points seem (to me) to be valid...  but they are just opinions of
course.  the 50% of the time in eval is correct based on recent profile
runs.  So that is not really opinion...


>
>I have another question about move generator though...
>
>A news message from Sep 7, 2000:
>
>">Ok, I tried 'perf' with crafty on my PIII 500 MHz. It says it can
>> generate/make/unmake about 2 Million moves/second.
>> From my FPGA I can do the same (except for castling and enpassent captures,
>> although generated, these need a little more CPU to actually make/unmake
>> since more than 2 squares are involved) at a rate of about 5 Million
>> moves/second. Besides generating the moves, the FPGA also in parallel keeps
>> track of material counts and generates 64-bit hashes for the current
>> position of all pieces and also of just the pawns.
>> Does the 'perf' test in crafty include some of the material counts or hashes
>> too?
>
>The Make/UnMake part of perf does _all_ of that.  Make/UnMake updates all
>the material counts, the hash keys, chess board, 50 move rule counter,
>etc." <end of quote>
>
>Is this really a fair comparision? The hardware move generators are also
>generating moves in MVV/LVA order, and even include subtle things like giving
>central squares priority for non-captures.

Hard to say.  Crafty generates moves in two passes... captures then
non-captures.  They are not sorted in any way other than (in general) to
generate moves that advance toward the opponent's side of the board first,
before generating retreats...



>
>I'm interested in this because Slate mentioned that he wanted to build a Crafty
>accelerator. Extrapolating those numbers to a 2GHz x86 CPU I would expect a
>software make/unmake rate of 8M moves/s which is going to be hard to do in a
>Belle style hardware move generator given that it would be implemented in an
>FGPA. It seems to be that he would be better served by considering the eval
>function first - plus it would be a lot easier to integrate an evaluation
>accelerator than a complete Belle on a chip.
>
>If movegen is 10% of the CPU and eval is 50%, then where's the other 40%? What's
>the next largest consumer of cycles?

InCheck(), Search(), Swap() and NextMove(), which are probably pretty
even in cpu usage...



>
>I was originally looking into hardware move generator because I was considering
>implementing something on a board where the CPU is a measly 30 MHz Arm 7 with a
>16-bit external data-bus, no cache, limited amount of memory... The choice of
>platform was heavily influenced by the fact that I happen to have access to such
>a board which also happens to have 4 reasonably large Virtex FPGAs (with an
>SDRAM DIMM connected to one of them.) The value proposition is a little
>different there.
>
>Regards,
>Keith



This page took 0.01 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.