Author: Keith Evans
Date: 01:48:01 04/08/02
Go up one level in this thread
On April 08, 2002 at 01:02:52, Robert Hyatt wrote: >I have no idea either. It seems high. Cray Blitz executed roughly 7K >instructions per node. We had good hardware performance counters to get >that number very exact. I suspect that Crafty is in the same ballpark, >roughly. I just did a quick test and got 250K nps on a 750mhz PIII... Here's a back of the envelope calculation that I would be interested in passing by you... Let's say for the sake of argument that Crafty's NPS will scale linearly with clock speed, so on a 2 GHz x86 32-bit CPU it will reach 666 knps. I'll guess that Crafty's branching factor is about 3, and we'll build an almost Crafty clone that is not able to use all of the searching tricks and it's branching factor is about 6. So if we're going to use our hardware almost clone to search 3 plies deep it has to get about 4.3Mnps to match the performance of the software Crafty. And if we count on searching 4 plies then we'll need to roughly double that number. It doesn't make an FPGA design sound all that promising when dealing with that class of CPU. Much more interesting for something like a PDA add-on. (Given a PDA with a large battery.) Did I miss anything? I have no idea how to factor in an improvement in evaluation. Based on your numbers you are spending roughly 3000 cycles per node. If you were to increase that number by say a factor of ten, then once the hardware breaks the half a million to million nps mark then it's interesting - we might even let it go six plies deep. So it seems that any efforts in this area will involve a lot of experimentation with evaluation. Did Hsu exagerate his 40k number, and/or will diminishing returns kick in? I was told that Brutus only searched 3 plies deep in hardware, which makes sense to me given the limited amount of evaluation that could be placed on a Virtex 405E. Depending on his chess program and CPU I would guess that it would be break even at 3 plies hardware search at somewhere around 2-4Mnps. >Hard to say. Crafty generates moves in two passes... captures then >non-captures. They are not sorted in any way other than (in general) to >generate moves that advance toward the opponent's side of the board first, >before generating retreats... Hmm... hardware could easily favor searching moves which tend to advance first too. This might be a nice small experiment for somebody to perform. They would have to change the arbiter - I don't think that you would want to make this programmable. Marc put in a mode in his movegen arbiter to generate either MVV/LVA or MVV/MVA and claims to prefer MVV/MVA at times. (Arbiter needs this function anyway as a function of STM.) Like I said it's programmable, so if it explodes... >> >>If movegen is 10% of the CPU and eval is 50%, then where's the other 40%? What's >>the next largest consumer of cycles? > >InCheck(), Search(), Swap() and NextMove(), which are probably pretty >even in cpu usage... Interesting - I would have thought that InCheck() would have been part of making moves. And NextMove() and Swap() sound a little suspicious too. I don't have the Crafty source available at the moment. Regards, Keith
This page took 0.01 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.