Author: Robert Hyatt
Date: 11:51:28 10/18/02
Go up one level in this thread
On October 18, 2002 at 14:29:07, Vincent Diepeveen wrote: >On October 17, 2002 at 19:25:11, Robert Hyatt wrote: > >Bob, without me wanting to say who is right here: >hsu or you ==> your statements contradicts Hsu's statement. > --- > CrazyBird(DM) kibitzes: the 1996 & 1997 version of Deep Blue are differnet >mainly in the amount of chess knowledge. > aics% > EeEk(DM) kibitzes: what was the difference? > aics% > CrazyBird(DM) kibitzes: we went to Benjamin's excellent chess school.:) > aics% What did I say that contradicts that? Nothing I can think of... If you mean the re-design, that is a pure fact mentioned many times. They had the original deep thought stuff, then a re-design for deep blue 1, and then another complete redesign for deep blue 2. That's in his book in great detail.. > --- >We both know that in theory *everything* can be done in hardware >what can get done in software too. However there is so many >practical issues that you simply don't make it in hardware to >100% the same implement things. Especially the low level at which >Hsu was programming means it was very hard to make the chip. He >did a great achievement by producing the chip. > >Being in Hardware has just one advantage and 3 big disadvantages. >In 1997 that is there were 3 disadvantages. > > - it's very expensive (fpga very cheap now) The Original deep thought chips cost less than $5,000 _total_ for all 16. The original deep blue 1 chips were also _not_ expensive. Project MOSIS is there just for this kind of stuff... I don't remember the details about DB2. I do remember IBM didn't make the chips themselves... > - the processor is clocked *way* lower than software processors > are clocked at (in 1997 the 300Mhz PII was there, versus 20Mhz > deep blue processors; like factor 15). So? The idea in hardware is to do _more_ in a clock cycle. The clock frequency is not an issue, clocks are used to synchronize at various points and let things settle before they get latched. In theory you could build a chip that searches 1M nodes in one clock cycle. It would be _much_ harder to do so, however... and there would be no point since nobody cares about the clock frequency, only how fast it searches chess trees... > - it's very hard to make a hardware chip Apparently not to Hsu. He did a complete chip design, and got it back and ready to play in less than one year total, more like nine months if I remember his book correctly... > >The only advantage is that things can get done in parallel. >That means if everything is sequential, that you then get 15 times >slower than software is in advance (in 1997 15 times, now it's way >way more than that; the technology to produce 15 times slower >processors than the 2.8Ghz P4s which are the latest now, >so 200Mhz processors, that's not exactly cheap still). > >And Hsu had just 20Mhz, later managed 'even' 24Mhz. So >every clock you waste to some sequential trying >of hashtable, and other search enhancements, they slow down >the cpu bigtime. Not at all. The hash probe was done in parallel with everything else. It just always "failed" since there was no memory present... > >If you implement: > nullmove > hashtables > killermoves > SEE (qsearch) > countermove > butterfly boards > history heuristics > > >though i do not believe the last 3 are smart move ordering enhancements >to make, if you implement them you are like 30 clocks slower than >without them. That is simply an uninformed statement. The logic will certainly be far more complex if those things are done. But not necessarily _slower_. Parallelism is the name of the game in ASIC design.. > >If you first need 10 clocks on average (which is very little for >0.60 micron) a node, then going to 40 clocks means a slow down >of a factor 3. That would be a factor of four. 40 / 10 == 4 > >That's clearly visible. But you can wave your hands all you want. Doesn't mean that 4x slower is a forced condition... > >I do not know the LATENCY from SRAM. sources who create themselves >processors for a living, they inform me that Deep Blue would have needed >a few megabytes of expensive SRAM (very expensive in 1997, EDO ram >was the standard back then) to not lose too much speed to communicate >with it. EdoRAM is no option for something that is capable of >searching at 2-2.5 MLN nodes a second. Doing over 2 million >probes a second at random locations at EDO ram is not something >i can recommend :) Do the math. EDO ram has 100ns cycle time. Deep Blue chess processors had a 50ns cycle time. Overlap the memory read with two early cycles and it is free... > >Now that still isn't as efficient as software, because the probes >get done to local ram to the processor then, which isn't iterating >itself, so it needs a huge overhead anyway when compared to >software. Only if you have some >global big fast parallel ram where each hardware cpu can independantly >get a cache line from, only then you get close to the efficiency >of software! The RAM design of the new DB chips supported a 16-way shared RAM between the processors on a single SP node. Not much way to do a shared hash table with 30 different nodes. 480-port memory would be impossibly complex and expensive. > >I didn't calculate them in the 40 clocks, because 40 clocks a node >already would slow down the thing 3 times. Just the sequential trying >of the different heuristics and search enhancements means simply you >lose extra processor clocks as it cannot get done in parallel. > Doesn't matter. See above. two chess chip clock cycles would be all that is needed to read from plain old DRAM. Using SRAM would cut it to under 1 cycle. >Apart from that, if the design goal is as many nodes a second, which >was a good goal before 1995, then obviously you don't care either for >efficiency! That is another false statement. Their "design goal" was _only_ to beat Kasparov. NPS or depth was _not_ the driving factor... > >>On October 17, 2002 at 12:41:59, Vincent Diepeveen wrote: >> >>>On October 16, 2002 at 11:03:33, emerson tan wrote: >>> >>>Nodes a second is not important. I hope you realize that >>>if you create a special program to go as fast as possible, >>>that getting around 40 million nodes a second is easily >>>possible at a dual K7. >>> >>>Do not ask how it plays though or how efficient it searches. >>> >>>Important factors are >>> - he needs a new very good book. He will not even get >>> 10th at the world championship when his book is from 1997, >>> and i do not know a single GM in the world who could do the >>> job for him. You need very special guys in this world to do >>> a book job. They are unique people, usually with many talents. >>> Just hiring a GM is not going to be a success in advance. >>> If you look what time it took for Alterman to contribute something >>> to the junior team, then you will start crying directly. >>> - the evaluation needs to get improved bigtime >>> - To get a billion nodes a second chip he needs around 100 million >>> dollar. Of course more cpu's doing around 40 MLN nodes a second >>> at say 500Mhz, he could do with just 10 million dollar. >>> But if you can afford 10 million dollar for 40MLN nps chips, >>> you can afford a big parallel machine too. Note that for a single >>> cpu chip doing about 4 million nodes a second, all he needs is >>> a cheap 3000 dollar FPGA thing. If you calculate well, then >>> you will see that deep blue got not so many nodes a second in >>> chip. it had 480 chips, and deep blue searched around 126 million >>> nodes a second on average against kasparov. So that's 265k nodes >>> a second at each chip. >>> >>> So a single chip getting 4 million nodes a second is very efficient >>> compared to that. >>> >>> - He needs more like a trillion nodes a second to compensate for >>> the inefficiency in hardware. No killermoves. No hashtables etcetera. >> >> >>You keep saying that without knowing what you are talkingabout. Read his book. >>You will find out that the chess processors _did_ have hash table support. He >>just >>didn't have time to design and build the memory for them. Belle was the >>"pattern" >>for deep thought. It was essentially "belle on a chip". Belle _did_ have hash >>tables >>in the hardware search... >> >>Given another year (a re-match in 1998) and they would have been hashing in the >>hardware. >> >>Killermoves is not a _huge_ loss. It is a loss, but not a factor of two or >>anything close >>to that... I can run the test and post the numbers if you want... >> >> >>> Of course the argument that it is possible to make hashtables in >>> hardware is not relevant as there is a price to that which is too >>> big to pay simply. >> >>Based on what? Memory is not particularly complex. It certainly is not >>expensive... >> >> >>> >>> Even for IBM it was too expensive to pay for >>> hashtables in hardware, despite that Hsu had created possibilities >>> for it, the RAM wasn't put on the chips and wasn't connected to the >>> cpu's. Something that improves the chips of course do get used when >>> they work somehow. Only price could have been the reason? Don't you >>> think that too? If not what could be the reason to not use hashtables, >>> knowing they improve efficiency? >> >>Lack of time. Hsu completely re-designed the chess chips, got them built, >>tested them, worked around some hardware bugs, suffered thru some fab >>problems that produced bad chips, and so forth. All in one year. He got the >>final chips weeks before the Kasparov match. >> >>It was an issue of time. Memory would have cost _far_ less than the chips >>(chess chips). >> >> >> >> >> >>> >>> the important thing to remember is that if i want to drive to >>> Paris with 2 cars and i just ship cars in all directions without >>> looking on a map or roadboard (representing the inefficiency), then >>> the chance is they land everywhere except on the highway to Paris. >>> >>> Even a trillion nodes a second isn't going to work if it is using >>> inefficient forms of search. >>> >>> It is not very nice from Hsu to focus upon how many nodes a second >>> he plans to get. For IBM that was important in 1997 to make marketing >>> with. It is not a fair comparision. >> >> >>The match was _not_ about NPS. It was purely about beating Kasparov. If they >>could have done it with 10 nodes per second, they would have. I don't know >>where >>you get this NPS fixation you have, but it is wrong. Just ask Hsu... >> >> >>> >>> If i go play at world champs 2003 with like 500 processors, i >>> do not talk about "this program uses up to a terabyte bandwidth >>> a second (1000000 MB/s) to outpower the other programs, whereas >>> the poor PC programs only have up to 0.000600 terabyte bandwidth >>> a second (600MB/s). >> >> >>First, you had better beat them... That's not going to be easy. NUMA has >>plenty of problems to overcome... >> >> >> >> >>> >>> That is not a fair comparision. Do you see why it is not a fair >>> comparision? >>> >>> He should say what search depth he plans to reach using such >>> chips. >> >> >>Depth is _also_ unimportant. Elsewise they could have just done like Junior >>does and report some "new" ply definition of their choosing, and nobody could >>refute them at all. >> >>This was about beating Kasparov. Not about NPS. Not about Depth. Not about >>_anything_ but beating Kasparov... >> >>Had you talked to them after they went to work for IBM you would know this. >>Those of use that did, do... >> >>> >>> However he quotes: "search depth is not so relevant". If it is not >>> so relevant then, why talk about nodes a second then anyway if >>> the usual goal of more nps (getting a bigger search depth) is >>> not considered important. >> >>They haven't been talking about NPS except in a very vague way. You have >>made it an issue, not them. They can't really tell you _exactly_ how fast they >>are going since they don't count nodes.. >> >> >>> >>>>EeEk(* DM) kibitzes: kib question from Frantic: According to what was >>>>published DB was evaluating 200 million positions per second (vs 2.5 >>>>to 5 million for the 8-way Simmons server running Deep Fritz). How >>>>fast would be Beep Blue today if the project continued? >>>>CrazyBird(DM) kibitzes: it contains a few reference at the end of the >>>>book for the more technically inclined. >>>>CrazyBird(DM) kibitzes: if we redo the chip in say, 0.13 micron, and >>>>with a improved architecture, it should be possible to do one billion >>>>nodes/sec on a single chip. >>>>CrazyBird(DM) kibitzes: so a trillion nodes/sec machine is actually >>>>possible today. >>>> >>>>If the cost is not that high maybe Hsu should make ala chessmachine that can be >>>>plug into computers (assuming that he has no legal obligation from ibm) The >>>>desktop pc is a long way from hiting 1billion nodes/sec. I think most of the >>>>professional chessplayers and serious chess hobbyist will buy. He can easily get >>>>1 million orders. 1 billion nodes/sec, mmm....:)
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.