Author: Robert Hyatt
Date: 20:11:02 04/08/02
Go up one level in this thread
On April 08, 2002 at 22:35:00, Keith Evans wrote: >On April 08, 2002 at 12:11:53, Robert Hyatt wrote: > >>On April 08, 2002 at 04:48:01, Keith Evans wrote: >> >>>On April 08, 2002 at 01:02:52, Robert Hyatt wrote: >>> >>>>I have no idea either. It seems high. Cray Blitz executed roughly 7K >>>>instructions per node. We had good hardware performance counters to get >>>>that number very exact. I suspect that Crafty is in the same ballpark, >>>>roughly. I just did a quick test and got 250K nps on a 750mhz PIII... >>> >>>Here's a back of the envelope calculation that I would be interested in passing >>>by you... >>> >>>Let's say for the sake of argument that Crafty's NPS will scale linearly with >>>clock speed, so on a 2 GHz x86 32-bit CPU it will reach 666 knps. I'll guess >>>that Crafty's branching factor is about 3, and we'll build an almost Crafty >>>clone that is not able to use all of the searching tricks and it's branching >>>factor is about 6. >>> >>>So if we're going to use our hardware almost clone to search 3 plies deep it has >>>to get about 4.3Mnps to match the performance of the software Crafty. And if we >>>count on searching 4 plies then we'll need to roughly double that number. >>> >>>It doesn't make an FPGA design sound all that promising when dealing with that >>>class of CPU. Much more interesting for something like a PDA add-on. (Given a >>>PDA with a large battery.) >>> >>>Did I miss anything? >> >> >>I tend to agree... doing a full "engine" in an FPGA would be a "task". The >>thing that seemed most interesting to me was to take the bitmap stuff and tuck >>it all into an FPGA so that we get special-purpose "instructions" like >>"makemove", "unmakemove", "generate moves" and perhaps even a Swap() function >>and InCheck function. That would certainly run the search speed up by a >>factor of at least two, which would be very significant. And it wouldn't >>give up anything at all since the actual search, hashing, etc would be done >>in software, but the bitmap/hash-signature update stuff would be done in >>hardware. >> > >I don't really have a good feel for what would be involved in this, but it seems >to me that if you are considering the use of an FPGA then we need to take a >radically different approach to achieve a performance gain over these 2 GHz CPUs >- the routing delays in FPGAs are huge compared to 1/2 ns. Yes... but don't forget the memory cost is not .5 ns but in the 100ns range, so that is pretty large... > >I know that your bitmap approach uses a lot of lookup tables, but I'm not sure >that those could be put on an FPGA or even that if that would have to be done on >the sort of chip that you're talking about. If they would be required - then can >you tell me how large they are? For bishops (assuming the "compact_attacks" version) I need 2 x 296 x 8 bytes. For normal attacks, it is 2 x 64 x 256 x 8 bytes... I normally use the compact version on intel machines which greatly reduces memory traffic and cache thrashing. > >On a totally different thread I believe that the XCV1000E class parts contain >enough RAM to put a 2kx64 bit hash table on them which would be single cycle >r/w, and even dual port if there were a reason. This might help reduce the BF >for shallow 3-4 ply searches. > >Xilinx even has some app notes on how to implement CAMs which might be useful >for repetition detection. The basic building block is a 16x8 CAM which means you >would need a fair number, but I think that it would fit. > >Regards, >Keith
This page took 0.01 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.