Author: Robert Hyatt
Date: 10:29:25 10/21/02
Go up one level in this thread
On October 18, 2002 at 14:58:39, Vincent Diepeveen wrote: >On October 18, 2002 at 14:51:28, Robert Hyatt wrote: > >guys on university are bad in counting money. >the deep thought project was a project paid >by the us government ==> university >if i remember well. > >when ibm attached its name to it it was very expensive! > >of course 0.60 micron is not so impressive by now. > >DeepThroats within IBM say around 30000$ each cpu of DBII was >paid by IBM. If you are talking about the SP2 that is about right for the cost per node. If you are talking about the chess chips themselves, that is _way_ over what they paid. > >480 x 30k = 14.4 million dollar paid by IBM. > Didn't cost anywhere near that. I am not sure if Hsu's book specifically gives a dollar amount, I will have to look. But it certainly wasn't $30K. It seems to me that somewhere along the way IBM claimed to have spent about $10,000,000.00 on the entire project, which includes the salaries of the people involved, including paying the GM advisors, Hsu/Campbell/Hoane and so forth. >Note also a bunch of test cpu's might have been pressed. >all of them 30k a piece. Didn't happen... > >>On October 18, 2002 at 14:29:07, Vincent Diepeveen wrote: >> >>>On October 17, 2002 at 19:25:11, Robert Hyatt wrote: >>> >>>Bob, without me wanting to say who is right here: >>>hsu or you ==> your statements contradicts Hsu's statement. >>> --- >>> CrazyBird(DM) kibitzes: the 1996 & 1997 version of Deep Blue are differnet >>>mainly in the amount of chess knowledge. >>> aics% >>> EeEk(DM) kibitzes: what was the difference? >>> aics% >>> CrazyBird(DM) kibitzes: we went to Benjamin's excellent chess school.:) >>> aics% >> >> >>What did I say that contradicts that? Nothing I can think of... >> >>If you mean the re-design, that is a pure fact mentioned many times. They had >>the >>original deep thought stuff, then a re-design for deep blue 1, and then another >>complete >>redesign for deep blue 2. That's in his book in great detail.. >> >> >> >> >>> --- >>>We both know that in theory *everything* can be done in hardware >>>what can get done in software too. However there is so many >>>practical issues that you simply don't make it in hardware to >>>100% the same implement things. Especially the low level at which >>>Hsu was programming means it was very hard to make the chip. He >>>did a great achievement by producing the chip. >>> >>>Being in Hardware has just one advantage and 3 big disadvantages. >>>In 1997 that is there were 3 disadvantages. >>> >>> - it's very expensive (fpga very cheap now) >> >>The Original deep thought chips cost less than $5,000 _total_ for all 16. >> >>The original deep blue 1 chips were also _not_ expensive. Project MOSIS is >>there >>just for this kind of stuff... >> >>I don't remember the details about DB2. I do remember IBM didn't make the chips >>themselves... >> >> >>> - the processor is clocked *way* lower than software processors >>> are clocked at (in 1997 the 300Mhz PII was there, versus 20Mhz >>> deep blue processors; like factor 15). >> >>So? The idea in hardware is to do _more_ in a clock cycle. The clock frequency >>is not >>an issue, clocks are used to synchronize at various points and let things settle >>before they >>get latched. In theory you could build a chip that searches 1M nodes in one >>clock cycle. >> >>It would be _much_ harder to do so, however... and there would be no point >>since nobody >>cares about the clock frequency, only how fast it searches chess trees... >> >> >>> - it's very hard to make a hardware chip >> >>Apparently not to Hsu. He did a complete chip design, and got it back and ready >>to play >>in less than one year total, more like nine months if I remember his book >>correctly... >> >> >> >> >>> >>>The only advantage is that things can get done in parallel. >>>That means if everything is sequential, that you then get 15 times >>>slower than software is in advance (in 1997 15 times, now it's way >>>way more than that; the technology to produce 15 times slower >>>processors than the 2.8Ghz P4s which are the latest now, >>>so 200Mhz processors, that's not exactly cheap still). >>> >>>And Hsu had just 20Mhz, later managed 'even' 24Mhz. So >>>every clock you waste to some sequential trying >>>of hashtable, and other search enhancements, they slow down >>>the cpu bigtime. >> >> >>Not at all. The hash probe was done in parallel with everything else. It just >>always "failed" since there was no memory present... >> >> >> >> >>> >>>If you implement: >>> nullmove >>> hashtables >>> killermoves >>> SEE (qsearch) >>> countermove >>> butterfly boards >>> history heuristics >>> >>> >>>though i do not believe the last 3 are smart move ordering enhancements >>>to make, if you implement them you are like 30 clocks slower than >>>without them. >> >> >>That is simply an uninformed statement. The logic will certainly be far more >>complex >>if those things are done. But not necessarily _slower_. Parallelism is the >>name of the game >>in ASIC design.. >> >> >>> >>>If you first need 10 clocks on average (which is very little for >>>0.60 micron) a node, then going to 40 clocks means a slow down >>>of a factor 3. >> >>That would be a factor of four. >> >>40 / 10 == 4 >> >>> >>>That's clearly visible. >> >>But you can wave your hands all you want. Doesn't mean that 4x slower is >>a forced condition... >> >> >> >> >> >>> >>>I do not know the LATENCY from SRAM. sources who create themselves >>>processors for a living, they inform me that Deep Blue would have needed >>>a few megabytes of expensive SRAM (very expensive in 1997, EDO ram >>>was the standard back then) to not lose too much speed to communicate >>>with it. EdoRAM is no option for something that is capable of >>>searching at 2-2.5 MLN nodes a second. Doing over 2 million >>>probes a second at random locations at EDO ram is not something >>>i can recommend :) >> >>Do the math. EDO ram has 100ns cycle time. Deep Blue chess processors had a >>50ns cycle time. Overlap the memory read with two early cycles and it is >>free... >> >> >> >> >> >>> >>>Now that still isn't as efficient as software, because the probes >>>get done to local ram to the processor then, which isn't iterating >>>itself, so it needs a huge overhead anyway when compared to >>>software. Only if you have some >>>global big fast parallel ram where each hardware cpu can independantly >>>get a cache line from, only then you get close to the efficiency >>>of software! >> >>The RAM design of the new DB chips supported a 16-way shared RAM between >>the processors on a single SP node. Not much way to do a shared hash table with >>30 different nodes. 480-port memory would be impossibly complex and expensive. >> >> >>> >>>I didn't calculate them in the 40 clocks, because 40 clocks a node >>>already would slow down the thing 3 times. Just the sequential trying >>>of the different heuristics and search enhancements means simply you >>>lose extra processor clocks as it cannot get done in parallel. >>> >> >> >>Doesn't matter. See above. two chess chip clock cycles would be all that is >>needed to >>read from plain old DRAM. Using SRAM would cut it to under 1 cycle. >> >> >> >> >> >>>Apart from that, if the design goal is as many nodes a second, which >>>was a good goal before 1995, then obviously you don't care either for >>>efficiency! >> >> >>That is another false statement. Their "design goal" was _only_ to beat >>Kasparov. >>NPS or depth was _not_ the driving factor... >> >> >> >>> >>>>On October 17, 2002 at 12:41:59, Vincent Diepeveen wrote: >>>> >>>>>On October 16, 2002 at 11:03:33, emerson tan wrote: >>>>> >>>>>Nodes a second is not important. I hope you realize that >>>>>if you create a special program to go as fast as possible, >>>>>that getting around 40 million nodes a second is easily >>>>>possible at a dual K7. >>>>> >>>>>Do not ask how it plays though or how efficient it searches. >>>>> >>>>>Important factors are >>>>> - he needs a new very good book. He will not even get >>>>> 10th at the world championship when his book is from 1997, >>>>> and i do not know a single GM in the world who could do the >>>>> job for him. You need very special guys in this world to do >>>>> a book job. They are unique people, usually with many talents. >>>>> Just hiring a GM is not going to be a success in advance. >>>>> If you look what time it took for Alterman to contribute something >>>>> to the junior team, then you will start crying directly. >>>>> - the evaluation needs to get improved bigtime >>>>> - To get a billion nodes a second chip he needs around 100 million >>>>> dollar. Of course more cpu's doing around 40 MLN nodes a second >>>>> at say 500Mhz, he could do with just 10 million dollar. >>>>> But if you can afford 10 million dollar for 40MLN nps chips, >>>>> you can afford a big parallel machine too. Note that for a single >>>>> cpu chip doing about 4 million nodes a second, all he needs is >>>>> a cheap 3000 dollar FPGA thing. If you calculate well, then >>>>> you will see that deep blue got not so many nodes a second in >>>>> chip. it had 480 chips, and deep blue searched around 126 million >>>>> nodes a second on average against kasparov. So that's 265k nodes >>>>> a second at each chip. >>>>> >>>>> So a single chip getting 4 million nodes a second is very efficient >>>>> compared to that. >>>>> >>>>> - He needs more like a trillion nodes a second to compensate for >>>>> the inefficiency in hardware. No killermoves. No hashtables etcetera. >>>> >>>> >>>>You keep saying that without knowing what you are talkingabout. Read his book. >>>>You will find out that the chess processors _did_ have hash table support. He >>>>just >>>>didn't have time to design and build the memory for them. Belle was the >>>>"pattern" >>>>for deep thought. It was essentially "belle on a chip". Belle _did_ have hash >>>>tables >>>>in the hardware search... >>>> >>>>Given another year (a re-match in 1998) and they would have been hashing in the >>>>hardware. >>>> >>>>Killermoves is not a _huge_ loss. It is a loss, but not a factor of two or >>>>anything close >>>>to that... I can run the test and post the numbers if you want... >>>> >>>> >>>>> Of course the argument that it is possible to make hashtables in >>>>> hardware is not relevant as there is a price to that which is too >>>>> big to pay simply. >>>> >>>>Based on what? Memory is not particularly complex. It certainly is not >>>>expensive... >>>> >>>> >>>>> >>>>> Even for IBM it was too expensive to pay for >>>>> hashtables in hardware, despite that Hsu had created possibilities >>>>> for it, the RAM wasn't put on the chips and wasn't connected to the >>>>> cpu's. Something that improves the chips of course do get used when >>>>> they work somehow. Only price could have been the reason? Don't you >>>>> think that too? If not what could be the reason to not use hashtables, >>>>> knowing they improve efficiency? >>>> >>>>Lack of time. Hsu completely re-designed the chess chips, got them built, >>>>tested them, worked around some hardware bugs, suffered thru some fab >>>>problems that produced bad chips, and so forth. All in one year. He got the >>>>final chips weeks before the Kasparov match. >>>> >>>>It was an issue of time. Memory would have cost _far_ less than the chips >>>>(chess chips). >>>> >>>> >>>> >>>> >>>> >>>>> >>>>> the important thing to remember is that if i want to drive to >>>>> Paris with 2 cars and i just ship cars in all directions without >>>>> looking on a map or roadboard (representing the inefficiency), then >>>>> the chance is they land everywhere except on the highway to Paris. >>>>> >>>>> Even a trillion nodes a second isn't going to work if it is using >>>>> inefficient forms of search. >>>>> >>>>> It is not very nice from Hsu to focus upon how many nodes a second >>>>> he plans to get. For IBM that was important in 1997 to make marketing >>>>> with. It is not a fair comparision. >>>> >>>> >>>>The match was _not_ about NPS. It was purely about beating Kasparov. If they >>>>could have done it with 10 nodes per second, they would have. I don't know >>>>where >>>>you get this NPS fixation you have, but it is wrong. Just ask Hsu... >>>> >>>> >>>>> >>>>> If i go play at world champs 2003 with like 500 processors, i >>>>> do not talk about "this program uses up to a terabyte bandwidth >>>>> a second (1000000 MB/s) to outpower the other programs, whereas >>>>> the poor PC programs only have up to 0.000600 terabyte bandwidth >>>>> a second (600MB/s). >>>> >>>> >>>>First, you had better beat them... That's not going to be easy. NUMA has >>>>plenty of problems to overcome... >>>> >>>> >>>> >>>> >>>>> >>>>> That is not a fair comparision. Do you see why it is not a fair >>>>> comparision? >>>>> >>>>> He should say what search depth he plans to reach using such >>>>> chips. >>>> >>>> >>>>Depth is _also_ unimportant. Elsewise they could have just done like Junior >>>>does and report some "new" ply definition of their choosing, and nobody could >>>>refute them at all. >>>> >>>>This was about beating Kasparov. Not about NPS. Not about Depth. Not about >>>>_anything_ but beating Kasparov... >>>> >>>>Had you talked to them after they went to work for IBM you would know this. >>>>Those of use that did, do... >>>> >>>>> >>>>> However he quotes: "search depth is not so relevant". If it is not >>>>> so relevant then, why talk about nodes a second then anyway if >>>>> the usual goal of more nps (getting a bigger search depth) is >>>>> not considered important. >>>> >>>>They haven't been talking about NPS except in a very vague way. You have >>>>made it an issue, not them. They can't really tell you _exactly_ how fast they >>>>are going since they don't count nodes.. >>>> >>>> >>>>> >>>>>>EeEk(* DM) kibitzes: kib question from Frantic: According to what was >>>>>>published DB was evaluating 200 million positions per second (vs 2.5 >>>>>>to 5 million for the 8-way Simmons server running Deep Fritz). How >>>>>>fast would be Beep Blue today if the project continued? >>>>>>CrazyBird(DM) kibitzes: it contains a few reference at the end of the >>>>>>book for the more technically inclined. >>>>>>CrazyBird(DM) kibitzes: if we redo the chip in say, 0.13 micron, and >>>>>>with a improved architecture, it should be possible to do one billion >>>>>>nodes/sec on a single chip. >>>>>>CrazyBird(DM) kibitzes: so a trillion nodes/sec machine is actually >>>>>>possible today. >>>>>> >>>>>>If the cost is not that high maybe Hsu should make ala chessmachine that can be >>>>>>plug into computers (assuming that he has no legal obligation from ibm) The >>>>>>desktop pc is a long way from hiting 1billion nodes/sec. I think most of the >>>>>>professional chessplayers and serious chess hobbyist will buy. He can easily get >>>>>>1 million orders. 1 billion nodes/sec, mmm....:)
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.