Author: Robert Hyatt
Date: 22:02:52 04/07/02
Go up one level in this thread
On April 07, 2002 at 12:50:36, Keith Evans wrote: >On April 06, 2002 at 22:09:03, Robert Hyatt wrote: > >>On April 06, 2002 at 00:53:39, Tom Kerrigan wrote: >> >>>On April 05, 2002 at 14:46:59, Robert Hyatt wrote: >>>>I think you pointed out the flaw yourself. 2000 instructions at 2ghz is not >>>>_nearly_ enough to do a node. And a 12mhz FPGA is a very slow FPGA. 100mhz >>>>is more like it for SOTA... I'll take on that 2ghz general-purpose CPU any >>>>time you want... >>> >>>First, my own program would search more than 1M NPS on a 2GHz chip. Which means >>>fewer than 2k cycles per node. Which means ~2k instructions per node, and >>>possibly less. Which means that not only are 2k instructions "nearly" enough to >>>do a node, they ARE enough to do a node. >> >>I believe I said a "real chess program"... I don't know of any "real" engines >>that search 2K instructions per node... I'm also talking about _real_ nodes... >>Just to be clear... > >Do you have any idea what the spread is? Or what this is for a recent Crafty on >an x86? Hsu mentioned 40,000 instructions per node as typical for a high-end >program in his IEEE Micro article, but didn't explain how he arrived at the >number. I have no idea either. It seems high. Cray Blitz executed roughly 7K instructions per node. We had good hardware performance counters to get that number very exact. I suspect that Crafty is in the same ballpark, roughly. I just did a quick test and got 250K nps on a 750mhz PIII... > >I've seen messages about Crafty which state that Evaluate() takes roughly 50% of >the CPU time. Which means that if we were to duplicate Crafty's eval in hardware >which could execute in zero time then we would at most double the performance of >Crafty. Now I take it that Hsu's eval is more complex than Crafty's so if we >were to replace the Crafty eval with something more complex then it starts to >get interesting. Both points seem (to me) to be valid... but they are just opinions of course. the 50% of the time in eval is correct based on recent profile runs. So that is not really opinion... > >I have another question about move generator though... > >A news message from Sep 7, 2000: > >">Ok, I tried 'perf' with crafty on my PIII 500 MHz. It says it can >> generate/make/unmake about 2 Million moves/second. >> From my FPGA I can do the same (except for castling and enpassent captures, >> although generated, these need a little more CPU to actually make/unmake >> since more than 2 squares are involved) at a rate of about 5 Million >> moves/second. Besides generating the moves, the FPGA also in parallel keeps >> track of material counts and generates 64-bit hashes for the current >> position of all pieces and also of just the pawns. >> Does the 'perf' test in crafty include some of the material counts or hashes >> too? > >The Make/UnMake part of perf does _all_ of that. Make/UnMake updates all >the material counts, the hash keys, chess board, 50 move rule counter, >etc." <end of quote> > >Is this really a fair comparision? The hardware move generators are also >generating moves in MVV/LVA order, and even include subtle things like giving >central squares priority for non-captures. Hard to say. Crafty generates moves in two passes... captures then non-captures. They are not sorted in any way other than (in general) to generate moves that advance toward the opponent's side of the board first, before generating retreats... > >I'm interested in this because Slate mentioned that he wanted to build a Crafty >accelerator. Extrapolating those numbers to a 2GHz x86 CPU I would expect a >software make/unmake rate of 8M moves/s which is going to be hard to do in a >Belle style hardware move generator given that it would be implemented in an >FGPA. It seems to be that he would be better served by considering the eval >function first - plus it would be a lot easier to integrate an evaluation >accelerator than a complete Belle on a chip. > >If movegen is 10% of the CPU and eval is 50%, then where's the other 40%? What's >the next largest consumer of cycles? InCheck(), Search(), Swap() and NextMove(), which are probably pretty even in cpu usage... > >I was originally looking into hardware move generator because I was considering >implementing something on a board where the CPU is a measly 30 MHz Arm 7 with a >16-bit external data-bus, no cache, limited amount of memory... The choice of >platform was heavily influenced by the fact that I happen to have access to such >a board which also happens to have 4 reasonably large Virtex FPGAs (with an >SDRAM DIMM connected to one of them.) The value proposition is a little >different there. > >Regards, >Keith
This page took 0.01 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.