Author: Robert Hyatt
Date: 09:11:53 04/08/02
Go up one level in this thread
On April 08, 2002 at 04:48:01, Keith Evans wrote: >On April 08, 2002 at 01:02:52, Robert Hyatt wrote: > >>I have no idea either. It seems high. Cray Blitz executed roughly 7K >>instructions per node. We had good hardware performance counters to get >>that number very exact. I suspect that Crafty is in the same ballpark, >>roughly. I just did a quick test and got 250K nps on a 750mhz PIII... > >Here's a back of the envelope calculation that I would be interested in passing >by you... > >Let's say for the sake of argument that Crafty's NPS will scale linearly with >clock speed, so on a 2 GHz x86 32-bit CPU it will reach 666 knps. I'll guess >that Crafty's branching factor is about 3, and we'll build an almost Crafty >clone that is not able to use all of the searching tricks and it's branching >factor is about 6. > >So if we're going to use our hardware almost clone to search 3 plies deep it has >to get about 4.3Mnps to match the performance of the software Crafty. And if we >count on searching 4 plies then we'll need to roughly double that number. > >It doesn't make an FPGA design sound all that promising when dealing with that >class of CPU. Much more interesting for something like a PDA add-on. (Given a >PDA with a large battery.) > >Did I miss anything? I tend to agree... doing a full "engine" in an FPGA would be a "task". The thing that seemed most interesting to me was to take the bitmap stuff and tuck it all into an FPGA so that we get special-purpose "instructions" like "makemove", "unmakemove", "generate moves" and perhaps even a Swap() function and InCheck function. That would certainly run the search speed up by a factor of at least two, which would be very significant. And it wouldn't give up anything at all since the actual search, hashing, etc would be done in software, but the bitmap/hash-signature update stuff would be done in hardware. > >I have no idea how to factor in an improvement in evaluation. Based on your >numbers you are spending roughly 3000 cycles per node. If you were to increase >that number by say a factor of ten, then once the hardware breaks the half a >million to million nps mark then it's interesting - we might even let it go six >plies deep. So it seems that any efforts in this area will involve a lot of >experimentation with evaluation. Did Hsu exagerate his 40k number, and/or will >diminishing returns kick in? > >I was told that Brutus only searched 3 plies deep in hardware, which makes sense >to me given the limited amount of evaluation that could be placed on a Virtex >405E. Depending on his chess program and CPU I would guess that it would be >break even at 3 plies hardware search at somewhere around 2-4Mnps. I agree. DB seemed to go 5-7 plies deep in hardware (although I saw some 4 and 8 ply searches (hardware) in some positions)... 3 plies is certainly significant, but "Nimzo" has never been a high-knowledge program, it has been a fast searcher. And hardware makes sense there. But with an evaluation- based program, where the evaluation changes all the time, doing it in hardware is not very efficient use of time... > >>Hard to say. Crafty generates moves in two passes... captures then >>non-captures. They are not sorted in any way other than (in general) to >>generate moves that advance toward the opponent's side of the board first, >>before generating retreats... > >Hmm... hardware could easily favor searching moves which tend to advance first >too. This might be a nice small experiment for somebody to perform. They would >have to change the arbiter - I don't think that you would want to make this >programmable. > >Marc put in a mode in his movegen arbiter to generate either MVV/LVA or MVV/MVA >and claims to prefer MVV/MVA at times. (Arbiter needs this function anyway as a >function of STM.) Like I said it's programmable, so if it explodes... > >>> >>>If movegen is 10% of the CPU and eval is 50%, then where's the other 40%? What's >>>the next largest consumer of cycles? >> >>InCheck(), Search(), Swap() and NextMove(), which are probably pretty >>even in cpu usage... > >Interesting - I would have thought that InCheck() would have been part of making >moves. And NextMove() and Swap() sound a little suspicious too. I don't have the >Crafty source available at the moment. > >Regards, >Keith NextMove() sorts captures and handles killers and history stuff. Swap() is a pretty compute-bound procedure that rattles off a series of captures on a single square to see who comes out ahead... Remember that I don't compute "who attacks this square" incrementally. So to ask that question requires a loop that generates all squares attacking this square (where the king stands) piece by piece...
This page took 0.01 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.