Author: Bob Durrett
Date: 13:08:32 02/18/04
Go up one level in this thread
On February 18, 2004 at 13:47:04, Robert Hyatt wrote: >On February 17, 2004 at 16:56:46, Bob Durrett wrote: > >> >>If one ignores the many anticipated objections to doing this research, then what >>would it take? >> >>I can see the following process: >> >>(1) A list of blocks of code which could be "hardware-ized" is made and the >>list ordered in order of decreasing expected benefit. >> >>(2) Then, the powers that be would decide which blocks to do that to. >> >>(3) Then it would be done. Testing and tuning would follow. >> >>(4) Then a match would be set up with Kasparov. >> >>(5) Etceteras. >> >>By "hardware-izing" I mean creating a chip which could input it's input data, >>process that data, and output that data all within the time of a few clock >>cycles of a modern microprocessor chip. >> >>QUESTIONS: >> >>Which block of Crafty code would be chosen first? How much benefit to expect? >> >>Same questions for second block of code. >> >>Etceteras for the rest of the code until nothing is left. >> >>Bob D. > > >We talked about this a couple of years ago. Fortunately, I have been around >long enough to understand the original Belle chess engine hardware design, and >how it was "cloned" into DT/DB (with a subtle change to make the chip smaller). > >With that in mind, crafty really doesn't fit a hardware implementation. IE much >of what I do would be _very_ difficult to do in hardware. It is the right way >to do things when memory is handy, but not when trying to implement the thing >directly in hardware. For example, SEE is not a nice hardware topic because it >is a variable-length process and hardware doesn't like that. Hardware >implementations use the old Belle MVV/LVA approach to ordering captures instead, >which is ok, but not nearly as good as SEE. I generate moves and stuff 'em into >an array. Hardware doesn't fit that very well, and the belle approach is >generally used where you generate one move, and make it and go on, then later >you come back and generate another move, and make it and so forth. Dark Thought >implemented the move generator this way, and I sometimes wish I had as it does >offer some efficiencies, but it also offers problems with the inability to use a >history heuristic, for example. > >The things that would help Crafty would be special purpose hardware instructions >that do the following: > >Make/Unmake moves. This special purpose hardware would keep the bitboards >internally in dedicated special-purpose registers, and do the updating needed as >asked by the software when making/unmaking a move. The software time for this >would then vanish. > >generate captures/generate non-captures/ generate check-evasions. It would be >nice to have an instruction that when executed, dumps a list of moves ready to >go, with no real overhead. The move generator time would basically vanish. > >A Hardware SEE would be nice, and is doable, but doing it in the middle of a >hardware search would be messy. But a separate SEE instruction would simply >make the cost of that vanish as well. > >The above add up to less than 50% of the total search time. Making them >"vanish" would make crafty possibly 2x faster. The last 50% is way tougher as >it is the evaluation code, and it consumes 50%+ of the total search time at >present. It could obviously be moved into hardware, and much of it would fit >very well since it is bitmap pattern matching. Some of it would be very >difficult (blocked pawn code since I look to see where all pawns can advance >safely to, and that is hard to do in a fixed number of clock cycles). But if we >did this part in hardware, then we _might_ hit speeds 10x faster or close to it, >since all that is left is just the search overhead itself. > >Ken's pre-1980 version of Belle did just this. This is the version of Belle >that first played in 1978 and was phased out in 1980 when he did the "whole >deal" in hardware. Doing this 1978 version produced a program that was 2x >faster than the closest competition, chess 4.7. Chess 4.7 searched about 2600 >nodes per second, 1978 belle hit 5000, but don't forget that the software part >was running on a _very_ slow minicomputer (a pdp 11/44). > >Making Crafty go perhaps 10x faster would be nice. It would take 2+ years to >get it all done and tested, probably. By then hardware has caught up somewhat >and that 10x has dropped to 2x-3x. That's the problem with hardware. And it is >why I am not that excited about Hydra. > >Some numbers: > >in 1975 chess 4.0 showed up at the ACM event on a CDC 7600 (Cyber 176). They >were searching 10x faster than anyone else. IE they were doing 2600 NPS on that >box, while the rest of us were struggling to hit 100, untill you happened to >find an IBM 360/91 or something laying around where you might hit 250-300 nps. >10x proved to be a dominating advantage, as their early ACM history shows. > >In 1978 Ken caught up and actually passed them by a factor of 2, but that factor >of 2 just made the games interesting, belle was not dominant. > >in 1980 we had just moved to the cray and started at about 1K nodes per second, >but eventually passed chess 4.x's 2.6K. But Belle showed up with a new hardware >design that was hitting 160K. 80x faster than chess 4.x and us. And it was a >dominant box for the next three years. Finally, by 1983 we were doing 20K on >the XMP-2 and that was enough to put us back in the game, as our search was more >efficient for lots of reasons such as no hash-move for ordering, no SEE in >Belle, no killer/history ordering. Etc. We steadily pulled ahead through 1986 >where we were doing 200K+ and the next closest thing was Belle but it was a ply >behind us then with our better software search. > >In 1987 along came chiptest/deep thought, and their speed improved by leaps and >bounds, quickly hitting 2-4M by the 1989 WCCC. And again, with that speed >advantage, they dominated computer chess thru their last event in 1995, losing >only one game I recall, to Fritz in Hong Kong. > >Hydra doesn't represent that remarkable a speed jump yet. IE they claim 3M >nodes per second per board, but something like 60% search efficiency, so back >that off to 2M x 8 board which is 16M nodes per second. I just ran on a machine >in CCT6 that was doing between 8 and 10M nodes per second, and I have tested on >an 8-way opteron that matches Hydra's speed pretty well, yet I don't have any of >the typical hardware shortcomings, such as their lack of hashing in the hardware >and so forth. > >That is why I said that it doesn't impress me that much. It's an interesting >project, and it produces some interesting speed, but it is not way out ahead of >what anyone could produce on a simple 8-way opteron, much less going to 16-way >and beyond. IE had it come out 3-4 years ago, it might have been a dominating >force for a couple of years, but today it simply isn't. > >Now if they put together a new version with faster FPGAs, or many more FPGAs, >then this might change. But I am evaluating reality not vaporware. Until I see >such a beast, I don't have to play it and won't worry about it. Meanwhile, I >know what current 64 bit hardware can do for me, and the result is _not_ >something to be taken lightly. CCT6 showed that... A most interesting set of comments and review of past experience. Very informative. Two questions stand out: (1) What's the problem with hash [and maybe also the other things referred to], and (2) Why is it assumed that the hardware would not advance too? It is true that an individual piece of hardware may not change, but new versions might get better. Consider the case of video and audio cards in PCs. They are getting better by leaps and bounds! Bob D.
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.