Author: Keith Evans
Date: 19:35:00 04/08/02
Go up one level in this thread
On April 08, 2002 at 12:11:53, Robert Hyatt wrote: >On April 08, 2002 at 04:48:01, Keith Evans wrote: > >>On April 08, 2002 at 01:02:52, Robert Hyatt wrote: >> >>>I have no idea either. It seems high. Cray Blitz executed roughly 7K >>>instructions per node. We had good hardware performance counters to get >>>that number very exact. I suspect that Crafty is in the same ballpark, >>>roughly. I just did a quick test and got 250K nps on a 750mhz PIII... >> >>Here's a back of the envelope calculation that I would be interested in passing >>by you... >> >>Let's say for the sake of argument that Crafty's NPS will scale linearly with >>clock speed, so on a 2 GHz x86 32-bit CPU it will reach 666 knps. I'll guess >>that Crafty's branching factor is about 3, and we'll build an almost Crafty >>clone that is not able to use all of the searching tricks and it's branching >>factor is about 6. >> >>So if we're going to use our hardware almost clone to search 3 plies deep it has >>to get about 4.3Mnps to match the performance of the software Crafty. And if we >>count on searching 4 plies then we'll need to roughly double that number. >> >>It doesn't make an FPGA design sound all that promising when dealing with that >>class of CPU. Much more interesting for something like a PDA add-on. (Given a >>PDA with a large battery.) >> >>Did I miss anything? > > >I tend to agree... doing a full "engine" in an FPGA would be a "task". The >thing that seemed most interesting to me was to take the bitmap stuff and tuck >it all into an FPGA so that we get special-purpose "instructions" like >"makemove", "unmakemove", "generate moves" and perhaps even a Swap() function >and InCheck function. That would certainly run the search speed up by a >factor of at least two, which would be very significant. And it wouldn't >give up anything at all since the actual search, hashing, etc would be done >in software, but the bitmap/hash-signature update stuff would be done in >hardware. > I don't really have a good feel for what would be involved in this, but it seems to me that if you are considering the use of an FPGA then we need to take a radically different approach to achieve a performance gain over these 2 GHz CPUs - the routing delays in FPGAs are huge compared to 1/2 ns. I know that your bitmap approach uses a lot of lookup tables, but I'm not sure that those could be put on an FPGA or even that if that would have to be done on the sort of chip that you're talking about. If they would be required - then can you tell me how large they are? On a totally different thread I believe that the XCV1000E class parts contain enough RAM to put a 2kx64 bit hash table on them which would be single cycle r/w, and even dual port if there were a reason. This might help reduce the BF for shallow 3-4 ply searches. Xilinx even has some app notes on how to implement CAMs which might be useful for repetition detection. The basic building block is a 16x8 CAM which means you would need a fair number, but I think that it would fit. Regards, Keith
This page took 0.01 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.