Author: Robert Hyatt
Date: 10:47:04 02/18/04
Go up one level in this thread
On February 17, 2004 at 16:56:46, Bob Durrett wrote: > >If one ignores the many anticipated objections to doing this research, then what >would it take? > >I can see the following process: > >(1) A list of blocks of code which could be "hardware-ized" is made and the >list ordered in order of decreasing expected benefit. > >(2) Then, the powers that be would decide which blocks to do that to. > >(3) Then it would be done. Testing and tuning would follow. > >(4) Then a match would be set up with Kasparov. > >(5) Etceteras. > >By "hardware-izing" I mean creating a chip which could input it's input data, >process that data, and output that data all within the time of a few clock >cycles of a modern microprocessor chip. > >QUESTIONS: > >Which block of Crafty code would be chosen first? How much benefit to expect? > >Same questions for second block of code. > >Etceteras for the rest of the code until nothing is left. > >Bob D. We talked about this a couple of years ago. Fortunately, I have been around long enough to understand the original Belle chess engine hardware design, and how it was "cloned" into DT/DB (with a subtle change to make the chip smaller). With that in mind, crafty really doesn't fit a hardware implementation. IE much of what I do would be _very_ difficult to do in hardware. It is the right way to do things when memory is handy, but not when trying to implement the thing directly in hardware. For example, SEE is not a nice hardware topic because it is a variable-length process and hardware doesn't like that. Hardware implementations use the old Belle MVV/LVA approach to ordering captures instead, which is ok, but not nearly as good as SEE. I generate moves and stuff 'em into an array. Hardware doesn't fit that very well, and the belle approach is generally used where you generate one move, and make it and go on, then later you come back and generate another move, and make it and so forth. Dark Thought implemented the move generator this way, and I sometimes wish I had as it does offer some efficiencies, but it also offers problems with the inability to use a history heuristic, for example. The things that would help Crafty would be special purpose hardware instructions that do the following: Make/Unmake moves. This special purpose hardware would keep the bitboards internally in dedicated special-purpose registers, and do the updating needed as asked by the software when making/unmaking a move. The software time for this would then vanish. generate captures/generate non-captures/ generate check-evasions. It would be nice to have an instruction that when executed, dumps a list of moves ready to go, with no real overhead. The move generator time would basically vanish. A Hardware SEE would be nice, and is doable, but doing it in the middle of a hardware search would be messy. But a separate SEE instruction would simply make the cost of that vanish as well. The above add up to less than 50% of the total search time. Making them "vanish" would make crafty possibly 2x faster. The last 50% is way tougher as it is the evaluation code, and it consumes 50%+ of the total search time at present. It could obviously be moved into hardware, and much of it would fit very well since it is bitmap pattern matching. Some of it would be very difficult (blocked pawn code since I look to see where all pawns can advance safely to, and that is hard to do in a fixed number of clock cycles). But if we did this part in hardware, then we _might_ hit speeds 10x faster or close to it, since all that is left is just the search overhead itself. Ken's pre-1980 version of Belle did just this. This is the version of Belle that first played in 1978 and was phased out in 1980 when he did the "whole deal" in hardware. Doing this 1978 version produced a program that was 2x faster than the closest competition, chess 4.7. Chess 4.7 searched about 2600 nodes per second, 1978 belle hit 5000, but don't forget that the software part was running on a _very_ slow minicomputer (a pdp 11/44). Making Crafty go perhaps 10x faster would be nice. It would take 2+ years to get it all done and tested, probably. By then hardware has caught up somewhat and that 10x has dropped to 2x-3x. That's the problem with hardware. And it is why I am not that excited about Hydra. Some numbers: in 1975 chess 4.0 showed up at the ACM event on a CDC 7600 (Cyber 176). They were searching 10x faster than anyone else. IE they were doing 2600 NPS on that box, while the rest of us were struggling to hit 100, untill you happened to find an IBM 360/91 or something laying around where you might hit 250-300 nps. 10x proved to be a dominating advantage, as their early ACM history shows. In 1978 Ken caught up and actually passed them by a factor of 2, but that factor of 2 just made the games interesting, belle was not dominant. in 1980 we had just moved to the cray and started at about 1K nodes per second, but eventually passed chess 4.x's 2.6K. But Belle showed up with a new hardware design that was hitting 160K. 80x faster than chess 4.x and us. And it was a dominant box for the next three years. Finally, by 1983 we were doing 20K on the XMP-2 and that was enough to put us back in the game, as our search was more efficient for lots of reasons such as no hash-move for ordering, no SEE in Belle, no killer/history ordering. Etc. We steadily pulled ahead through 1986 where we were doing 200K+ and the next closest thing was Belle but it was a ply behind us then with our better software search. In 1987 along came chiptest/deep thought, and their speed improved by leaps and bounds, quickly hitting 2-4M by the 1989 WCCC. And again, with that speed advantage, they dominated computer chess thru their last event in 1995, losing only one game I recall, to Fritz in Hong Kong. Hydra doesn't represent that remarkable a speed jump yet. IE they claim 3M nodes per second per board, but something like 60% search efficiency, so back that off to 2M x 8 board which is 16M nodes per second. I just ran on a machine in CCT6 that was doing between 8 and 10M nodes per second, and I have tested on an 8-way opteron that matches Hydra's speed pretty well, yet I don't have any of the typical hardware shortcomings, such as their lack of hashing in the hardware and so forth. That is why I said that it doesn't impress me that much. It's an interesting project, and it produces some interesting speed, but it is not way out ahead of what anyone could produce on a simple 8-way opteron, much less going to 16-way and beyond. IE had it come out 3-4 years ago, it might have been a dominating force for a couple of years, but today it simply isn't. Now if they put together a new version with faster FPGAs, or many more FPGAs, then this might change. But I am evaluating reality not vaporware. Until I see such a beast, I don't have to play it and won't worry about it. Meanwhile, I know what current 64 bit hardware can do for me, and the result is _not_ something to be taken lightly. CCT6 showed that...
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.