Author: Vincent Diepeveen
Date: 11:58:39 10/18/02
Go up one level in this thread
On October 18, 2002 at 14:51:28, Robert Hyatt wrote: guys on university are bad in counting money. the deep thought project was a project paid by the us government ==> university if i remember well. when ibm attached its name to it it was very expensive! of course 0.60 micron is not so impressive by now. DeepThroats within IBM say around 30000$ each cpu of DBII was paid by IBM. 480 x 30k = 14.4 million dollar paid by IBM. Note also a bunch of test cpu's might have been pressed. all of them 30k a piece. >On October 18, 2002 at 14:29:07, Vincent Diepeveen wrote: > >>On October 17, 2002 at 19:25:11, Robert Hyatt wrote: >> >>Bob, without me wanting to say who is right here: >>hsu or you ==> your statements contradicts Hsu's statement. >> --- >> CrazyBird(DM) kibitzes: the 1996 & 1997 version of Deep Blue are differnet >>mainly in the amount of chess knowledge. >> aics% >> EeEk(DM) kibitzes: what was the difference? >> aics% >> CrazyBird(DM) kibitzes: we went to Benjamin's excellent chess school.:) >> aics% > > >What did I say that contradicts that? Nothing I can think of... > >If you mean the re-design, that is a pure fact mentioned many times. They had >the >original deep thought stuff, then a re-design for deep blue 1, and then another >complete >redesign for deep blue 2. That's in his book in great detail.. > > > > >> --- >>We both know that in theory *everything* can be done in hardware >>what can get done in software too. However there is so many >>practical issues that you simply don't make it in hardware to >>100% the same implement things. Especially the low level at which >>Hsu was programming means it was very hard to make the chip. He >>did a great achievement by producing the chip. >> >>Being in Hardware has just one advantage and 3 big disadvantages. >>In 1997 that is there were 3 disadvantages. >> >> - it's very expensive (fpga very cheap now) > >The Original deep thought chips cost less than $5,000 _total_ for all 16. > >The original deep blue 1 chips were also _not_ expensive. Project MOSIS is >there >just for this kind of stuff... > >I don't remember the details about DB2. I do remember IBM didn't make the chips >themselves... > > >> - the processor is clocked *way* lower than software processors >> are clocked at (in 1997 the 300Mhz PII was there, versus 20Mhz >> deep blue processors; like factor 15). > >So? The idea in hardware is to do _more_ in a clock cycle. The clock frequency >is not >an issue, clocks are used to synchronize at various points and let things settle >before they >get latched. In theory you could build a chip that searches 1M nodes in one >clock cycle. > >It would be _much_ harder to do so, however... and there would be no point >since nobody >cares about the clock frequency, only how fast it searches chess trees... > > >> - it's very hard to make a hardware chip > >Apparently not to Hsu. He did a complete chip design, and got it back and ready >to play >in less than one year total, more like nine months if I remember his book >correctly... > > > > >> >>The only advantage is that things can get done in parallel. >>That means if everything is sequential, that you then get 15 times >>slower than software is in advance (in 1997 15 times, now it's way >>way more than that; the technology to produce 15 times slower >>processors than the 2.8Ghz P4s which are the latest now, >>so 200Mhz processors, that's not exactly cheap still). >> >>And Hsu had just 20Mhz, later managed 'even' 24Mhz. So >>every clock you waste to some sequential trying >>of hashtable, and other search enhancements, they slow down >>the cpu bigtime. > > >Not at all. The hash probe was done in parallel with everything else. It just >always "failed" since there was no memory present... > > > > >> >>If you implement: >> nullmove >> hashtables >> killermoves >> SEE (qsearch) >> countermove >> butterfly boards >> history heuristics >> >> >>though i do not believe the last 3 are smart move ordering enhancements >>to make, if you implement them you are like 30 clocks slower than >>without them. > > >That is simply an uninformed statement. The logic will certainly be far more >complex >if those things are done. But not necessarily _slower_. Parallelism is the >name of the game >in ASIC design.. > > >> >>If you first need 10 clocks on average (which is very little for >>0.60 micron) a node, then going to 40 clocks means a slow down >>of a factor 3. > >That would be a factor of four. > >40 / 10 == 4 > >> >>That's clearly visible. > >But you can wave your hands all you want. Doesn't mean that 4x slower is >a forced condition... > > > > > >> >>I do not know the LATENCY from SRAM. sources who create themselves >>processors for a living, they inform me that Deep Blue would have needed >>a few megabytes of expensive SRAM (very expensive in 1997, EDO ram >>was the standard back then) to not lose too much speed to communicate >>with it. EdoRAM is no option for something that is capable of >>searching at 2-2.5 MLN nodes a second. Doing over 2 million >>probes a second at random locations at EDO ram is not something >>i can recommend :) > >Do the math. EDO ram has 100ns cycle time. Deep Blue chess processors had a >50ns cycle time. Overlap the memory read with two early cycles and it is >free... > > > > > >> >>Now that still isn't as efficient as software, because the probes >>get done to local ram to the processor then, which isn't iterating >>itself, so it needs a huge overhead anyway when compared to >>software. Only if you have some >>global big fast parallel ram where each hardware cpu can independantly >>get a cache line from, only then you get close to the efficiency >>of software! > >The RAM design of the new DB chips supported a 16-way shared RAM between >the processors on a single SP node. Not much way to do a shared hash table with >30 different nodes. 480-port memory would be impossibly complex and expensive. > > >> >>I didn't calculate them in the 40 clocks, because 40 clocks a node >>already would slow down the thing 3 times. Just the sequential trying >>of the different heuristics and search enhancements means simply you >>lose extra processor clocks as it cannot get done in parallel. >> > > >Doesn't matter. See above. two chess chip clock cycles would be all that is >needed to >read from plain old DRAM. Using SRAM would cut it to under 1 cycle. > > > > > >>Apart from that, if the design goal is as many nodes a second, which >>was a good goal before 1995, then obviously you don't care either for >>efficiency! > > >That is another false statement. Their "design goal" was _only_ to beat >Kasparov. >NPS or depth was _not_ the driving factor... > > > >> >>>On October 17, 2002 at 12:41:59, Vincent Diepeveen wrote: >>> >>>>On October 16, 2002 at 11:03:33, emerson tan wrote: >>>> >>>>Nodes a second is not important. I hope you realize that >>>>if you create a special program to go as fast as possible, >>>>that getting around 40 million nodes a second is easily >>>>possible at a dual K7. >>>> >>>>Do not ask how it plays though or how efficient it searches. >>>> >>>>Important factors are >>>> - he needs a new very good book. He will not even get >>>> 10th at the world championship when his book is from 1997, >>>> and i do not know a single GM in the world who could do the >>>> job for him. You need very special guys in this world to do >>>> a book job. They are unique people, usually with many talents. >>>> Just hiring a GM is not going to be a success in advance. >>>> If you look what time it took for Alterman to contribute something >>>> to the junior team, then you will start crying directly. >>>> - the evaluation needs to get improved bigtime >>>> - To get a billion nodes a second chip he needs around 100 million >>>> dollar. Of course more cpu's doing around 40 MLN nodes a second >>>> at say 500Mhz, he could do with just 10 million dollar. >>>> But if you can afford 10 million dollar for 40MLN nps chips, >>>> you can afford a big parallel machine too. Note that for a single >>>> cpu chip doing about 4 million nodes a second, all he needs is >>>> a cheap 3000 dollar FPGA thing. If you calculate well, then >>>> you will see that deep blue got not so many nodes a second in >>>> chip. it had 480 chips, and deep blue searched around 126 million >>>> nodes a second on average against kasparov. So that's 265k nodes >>>> a second at each chip. >>>> >>>> So a single chip getting 4 million nodes a second is very efficient >>>> compared to that. >>>> >>>> - He needs more like a trillion nodes a second to compensate for >>>> the inefficiency in hardware. No killermoves. No hashtables etcetera. >>> >>> >>>You keep saying that without knowing what you are talkingabout. Read his book. >>>You will find out that the chess processors _did_ have hash table support. He >>>just >>>didn't have time to design and build the memory for them. Belle was the >>>"pattern" >>>for deep thought. It was essentially "belle on a chip". Belle _did_ have hash >>>tables >>>in the hardware search... >>> >>>Given another year (a re-match in 1998) and they would have been hashing in the >>>hardware. >>> >>>Killermoves is not a _huge_ loss. It is a loss, but not a factor of two or >>>anything close >>>to that... I can run the test and post the numbers if you want... >>> >>> >>>> Of course the argument that it is possible to make hashtables in >>>> hardware is not relevant as there is a price to that which is too >>>> big to pay simply. >>> >>>Based on what? Memory is not particularly complex. It certainly is not >>>expensive... >>> >>> >>>> >>>> Even for IBM it was too expensive to pay for >>>> hashtables in hardware, despite that Hsu had created possibilities >>>> for it, the RAM wasn't put on the chips and wasn't connected to the >>>> cpu's. Something that improves the chips of course do get used when >>>> they work somehow. Only price could have been the reason? Don't you >>>> think that too? If not what could be the reason to not use hashtables, >>>> knowing they improve efficiency? >>> >>>Lack of time. Hsu completely re-designed the chess chips, got them built, >>>tested them, worked around some hardware bugs, suffered thru some fab >>>problems that produced bad chips, and so forth. All in one year. He got the >>>final chips weeks before the Kasparov match. >>> >>>It was an issue of time. Memory would have cost _far_ less than the chips >>>(chess chips). >>> >>> >>> >>> >>> >>>> >>>> the important thing to remember is that if i want to drive to >>>> Paris with 2 cars and i just ship cars in all directions without >>>> looking on a map or roadboard (representing the inefficiency), then >>>> the chance is they land everywhere except on the highway to Paris. >>>> >>>> Even a trillion nodes a second isn't going to work if it is using >>>> inefficient forms of search. >>>> >>>> It is not very nice from Hsu to focus upon how many nodes a second >>>> he plans to get. For IBM that was important in 1997 to make marketing >>>> with. It is not a fair comparision. >>> >>> >>>The match was _not_ about NPS. It was purely about beating Kasparov. If they >>>could have done it with 10 nodes per second, they would have. I don't know >>>where >>>you get this NPS fixation you have, but it is wrong. Just ask Hsu... >>> >>> >>>> >>>> If i go play at world champs 2003 with like 500 processors, i >>>> do not talk about "this program uses up to a terabyte bandwidth >>>> a second (1000000 MB/s) to outpower the other programs, whereas >>>> the poor PC programs only have up to 0.000600 terabyte bandwidth >>>> a second (600MB/s). >>> >>> >>>First, you had better beat them... That's not going to be easy. NUMA has >>>plenty of problems to overcome... >>> >>> >>> >>> >>>> >>>> That is not a fair comparision. Do you see why it is not a fair >>>> comparision? >>>> >>>> He should say what search depth he plans to reach using such >>>> chips. >>> >>> >>>Depth is _also_ unimportant. Elsewise they could have just done like Junior >>>does and report some "new" ply definition of their choosing, and nobody could >>>refute them at all. >>> >>>This was about beating Kasparov. Not about NPS. Not about Depth. Not about >>>_anything_ but beating Kasparov... >>> >>>Had you talked to them after they went to work for IBM you would know this. >>>Those of use that did, do... >>> >>>> >>>> However he quotes: "search depth is not so relevant". If it is not >>>> so relevant then, why talk about nodes a second then anyway if >>>> the usual goal of more nps (getting a bigger search depth) is >>>> not considered important. >>> >>>They haven't been talking about NPS except in a very vague way. You have >>>made it an issue, not them. They can't really tell you _exactly_ how fast they >>>are going since they don't count nodes.. >>> >>> >>>> >>>>>EeEk(* DM) kibitzes: kib question from Frantic: According to what was >>>>>published DB was evaluating 200 million positions per second (vs 2.5 >>>>>to 5 million for the 8-way Simmons server running Deep Fritz). How >>>>>fast would be Beep Blue today if the project continued? >>>>>CrazyBird(DM) kibitzes: it contains a few reference at the end of the >>>>>book for the more technically inclined. >>>>>CrazyBird(DM) kibitzes: if we redo the chip in say, 0.13 micron, and >>>>>with a improved architecture, it should be possible to do one billion >>>>>nodes/sec on a single chip. >>>>>CrazyBird(DM) kibitzes: so a trillion nodes/sec machine is actually >>>>>possible today. >>>>> >>>>>If the cost is not that high maybe Hsu should make ala chessmachine that can be >>>>>plug into computers (assuming that he has no legal obligation from ibm) The >>>>>desktop pc is a long way from hiting 1billion nodes/sec. I think most of the >>>>>professional chessplayers and serious chess hobbyist will buy. He can easily get >>>>>1 million orders. 1 billion nodes/sec, mmm....:)
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.