Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: rebuilding deep blue - it was too inefficient!!

Author: Vincent Diepeveen

Date: 11:58:39 10/18/02

Go up one level in this thread


On October 18, 2002 at 14:51:28, Robert Hyatt wrote:

guys on university are bad in counting money.
the deep thought project was a project paid
by the us government ==> university
if i remember well.

when ibm attached its name to it it was very expensive!

of course 0.60 micron is not so impressive by now.

DeepThroats within IBM say around 30000$ each cpu of DBII was
paid by IBM.

480 x 30k = 14.4 million dollar paid by IBM.

Note also a bunch of test cpu's might have been pressed.
all of them 30k a piece.

>On October 18, 2002 at 14:29:07, Vincent Diepeveen wrote:
>
>>On October 17, 2002 at 19:25:11, Robert Hyatt wrote:
>>
>>Bob, without me wanting to say who is right here:
>>hsu or you ==> your statements contradicts Hsu's statement.
>>  ---
>>  CrazyBird(DM) kibitzes: the 1996 & 1997 version of Deep Blue are differnet
>>mainly in the amount of chess knowledge.
>>  aics%
>>  EeEk(DM) kibitzes: what was the difference?
>>  aics%
>>  CrazyBird(DM) kibitzes: we went to Benjamin's excellent chess school.:)
>>  aics%
>
>
>What did I say that contradicts that?  Nothing I can think of...
>
>If you mean the re-design, that is a pure fact mentioned many times.  They had
>the
>original deep thought stuff, then a re-design for deep blue 1, and then another
>complete
>redesign for deep blue 2.  That's in his book in great detail..
>
>
>
>
>>  ---
>>We both know that in theory *everything* can be done in hardware
>>what can get done in software too. However there is so many
>>practical issues that you simply don't make it in hardware to
>>100% the same implement things. Especially the low level at which
>>Hsu was programming means it was very hard to make the chip. He
>>did a great achievement by producing the chip.
>>
>>Being in Hardware has just one advantage and 3 big disadvantages.
>>In 1997 that is there were 3 disadvantages.
>>
>>  - it's very expensive (fpga very cheap now)
>
>The Original deep thought chips cost less than $5,000 _total_ for all 16.
>
>The original deep blue 1 chips were also _not_ expensive.  Project MOSIS is
>there
>just for this kind of stuff...
>
>I don't remember the details about DB2.  I do remember IBM didn't make the chips
>themselves...
>
>
>>  - the processor is clocked *way* lower than software processors
>>    are clocked at (in 1997 the 300Mhz PII was there, versus 20Mhz
>>    deep blue processors; like factor 15).
>
>So?  The idea in hardware is to do _more_ in a clock cycle.  The clock frequency
>is not
>an issue, clocks are used to synchronize at various points and let things settle
>before they
>get latched.  In theory you could build a chip that searches 1M nodes in one
>clock cycle.
>
>It would be _much_ harder to do so, however...  and there would be no point
>since nobody
>cares about the clock frequency, only how fast it searches chess trees...
>
>
>>  - it's very hard to make a hardware chip
>
>Apparently not to Hsu.  He did a complete chip design, and got it back and ready
>to play
>in less than one year total, more like nine months if I remember his book
>correctly...
>
>
>
>
>>
>>The only advantage is that things can get done in parallel.
>>That means if everything is sequential, that you then get 15 times
>>slower than software is in advance (in 1997 15 times, now it's way
>>way more than that; the technology to produce 15 times slower
>>processors than the 2.8Ghz P4s which are the latest now,
>>so 200Mhz processors, that's not exactly cheap still).
>>
>>And Hsu had just 20Mhz, later managed 'even' 24Mhz. So
>>every clock you waste to some sequential trying
>>of hashtable, and other search enhancements, they slow down
>>the cpu bigtime.
>
>
>Not at all.  The hash probe was done in parallel with everything else.  It just
>always "failed" since there was no memory present...
>
>
>
>
>>
>>If you implement:
>>  nullmove
>>  hashtables
>>  killermoves
>>  SEE (qsearch)
>>  countermove
>>  butterfly boards
>>  history heuristics
>>
>>
>>though i do not believe the last 3 are smart move ordering enhancements
>>to make, if you implement them you are like 30 clocks slower than
>>without them.
>
>
>That is simply an uninformed statement.  The logic will certainly be far more
>complex
>if those things are done.  But not necessarily _slower_.  Parallelism is the
>name of the game
>in ASIC design..
>
>
>>
>>If you first need 10 clocks on average (which is very little for
>>0.60 micron) a node, then going to 40 clocks means a slow down
>>of a factor 3.
>
>That would be a factor of four.
>
>40 / 10 == 4
>
>>
>>That's clearly visible.
>
>But you can wave your hands all you want.  Doesn't mean that 4x slower is
>a forced condition...
>
>
>
>
>
>>
>>I do not know the LATENCY from SRAM. sources who create themselves
>>processors for a living, they inform me that Deep Blue would have needed
>>a few megabytes of expensive SRAM (very expensive in 1997, EDO ram
>>was the standard back then) to not lose too much speed to communicate
>>with it. EdoRAM is no option for something that is capable of
>>searching at 2-2.5 MLN nodes a second. Doing over 2 million
>>probes a second at random locations at EDO ram is not something
>>i can recommend :)
>
>Do the math.  EDO ram has 100ns cycle time.  Deep Blue chess processors had a
>50ns cycle time.  Overlap the memory read with two early cycles and it is
>free...
>
>
>
>
>
>>
>>Now that still isn't as efficient as software, because the probes
>>get done to local ram to the processor then, which isn't iterating
>>itself, so it needs a huge overhead anyway when compared to
>>software. Only if you have some
>>global big fast parallel ram where each hardware cpu can independantly
>>get a cache line from, only then you get close to the efficiency
>>of software!
>
>The RAM design of the new DB chips supported a 16-way shared RAM between
>the processors on a single SP node.  Not much way to do a shared hash table with
>30 different nodes.  480-port memory would be impossibly complex and expensive.
>
>
>>
>>I didn't calculate them in the 40 clocks, because 40 clocks a node
>>already would slow down the thing 3 times. Just the sequential trying
>>of the different heuristics and search enhancements means simply you
>>lose extra processor clocks as it cannot get done in parallel.
>>
>
>
>Doesn't matter.  See above.  two chess chip clock cycles would be all that is
>needed to
>read from plain old DRAM.  Using SRAM would cut it to under 1 cycle.
>
>
>
>
>
>>Apart from that, if the design goal is as many nodes a second, which
>>was a good goal before 1995, then obviously you don't care either for
>>efficiency!
>
>
>That is another false statement.  Their "design goal" was _only_ to beat
>Kasparov.
>NPS or depth was _not_ the driving factor...
>
>
>
>>
>>>On October 17, 2002 at 12:41:59, Vincent Diepeveen wrote:
>>>
>>>>On October 16, 2002 at 11:03:33, emerson tan wrote:
>>>>
>>>>Nodes a second is not important. I hope you realize that
>>>>if you create a special program to go as fast as possible,
>>>>that getting around 40 million nodes a second is easily
>>>>possible at a dual K7.
>>>>
>>>>Do not ask how it plays though or how efficient it searches.
>>>>
>>>>Important factors are
>>>>  - he needs a new very good book. He will not even get
>>>>    10th at the world championship when his book is from 1997,
>>>>    and i do not know a single GM in the world who could do the
>>>>    job for him. You need very special guys in this world to do
>>>>    a book job. They are unique people, usually with many talents.
>>>>    Just hiring a GM is not going to be a success in advance.
>>>>    If you look what time it took for Alterman to contribute something
>>>>    to the junior team, then you will start crying directly.
>>>>  - the evaluation needs to get improved bigtime
>>>>  - To get a billion nodes a second chip he needs around 100 million
>>>>    dollar. Of course more cpu's doing around 40 MLN nodes a second
>>>>    at say 500Mhz, he could do with just 10 million dollar.
>>>>    But if you can afford 10 million dollar for 40MLN nps chips,
>>>>    you can afford a big parallel machine too. Note that for a single
>>>>    cpu chip doing about 4 million nodes a second, all he needs is
>>>>    a cheap 3000 dollar FPGA thing. If you calculate well, then
>>>>    you will see that deep blue got not so many nodes a second in
>>>>    chip. it had 480 chips, and deep blue searched around 126 million
>>>>    nodes a second on average against kasparov. So that's 265k nodes
>>>>    a second at each chip.
>>>>
>>>>    So a single chip getting 4 million nodes a second is very efficient
>>>>    compared to that.
>>>>
>>>>  - He needs more like a trillion nodes a second to compensate for
>>>>    the inefficiency in hardware. No killermoves. No hashtables etcetera.
>>>
>>>
>>>You keep saying that without knowing what you are talkingabout.  Read his book.
>>>You will find out that the chess processors _did_ have hash table support.  He
>>>just
>>>didn't have time to design and build the memory for them.  Belle was the
>>>"pattern"
>>>for deep thought.  It was essentially "belle on a chip".  Belle _did_ have hash
>>>tables
>>>in the hardware search...
>>>
>>>Given another year (a re-match in 1998) and they would have been hashing in the
>>>hardware.
>>>
>>>Killermoves is not a _huge_ loss.  It is a loss, but not a factor of two or
>>>anything close
>>>to that...  I can run the test and post the numbers if you want...
>>>
>>>
>>>>    Of course the argument that it is possible to make hashtables in
>>>>    hardware is not relevant as there is a price to that which is too
>>>>    big to pay simply.
>>>
>>>Based on what?  Memory is not particularly complex.  It certainly is not
>>>expensive...
>>>
>>>
>>>>
>>>>    Even for IBM it was too expensive to pay for
>>>>    hashtables in hardware, despite that Hsu had created possibilities
>>>>    for it, the RAM wasn't put on the chips and wasn't connected to the
>>>>    cpu's. Something that improves the chips of course do get used when
>>>>    they work somehow. Only price could have been the reason? Don't you
>>>>    think that too? If not what could be the reason to not use hashtables,
>>>>    knowing they improve efficiency?
>>>
>>>Lack of time.  Hsu completely re-designed the chess chips, got them built,
>>>tested them, worked around some hardware bugs, suffered thru some fab
>>>problems that produced bad chips, and so forth.  All in one year.  He got the
>>>final chips weeks before the Kasparov match.
>>>
>>>It was an issue of time.  Memory would have cost _far_ less than the chips
>>>(chess chips).
>>>
>>>
>>>
>>>
>>>
>>>>
>>>>    the important thing to remember is that if i want to drive to
>>>>    Paris with 2 cars and i just ship cars in all directions without
>>>>    looking on a map or roadboard (representing the inefficiency), then
>>>>    the chance is they land everywhere except on the highway to Paris.
>>>>
>>>>    Even a trillion nodes a second isn't going to work if it is using
>>>>    inefficient forms of search.
>>>>
>>>>    It is not very nice from Hsu to focus upon how many nodes a second
>>>>    he plans to get. For IBM that was important in 1997 to make marketing
>>>>    with. It is not a fair comparision.
>>>
>>>
>>>The match was _not_ about NPS.  It was purely about beating Kasparov.  If they
>>>could have done it with 10 nodes per second, they would have.  I don't know
>>>where
>>>you get this NPS fixation you have, but it is wrong.  Just ask Hsu...
>>>
>>>
>>>>
>>>>    If i go play at world champs 2003 with like 500 processors, i
>>>>    do not talk about "this program uses up to a terabyte bandwidth
>>>>    a second (1000000 MB/s) to outpower the other programs, whereas
>>>>    the poor PC programs only have up to 0.000600 terabyte bandwidth
>>>>    a second (600MB/s).
>>>
>>>
>>>First, you had better beat them...  That's not going to be easy.  NUMA has
>>>plenty of problems to overcome...
>>>
>>>
>>>
>>>
>>>>
>>>>    That is not a fair comparision. Do you see why it is not a fair
>>>>    comparision?
>>>>
>>>>    He should say what search depth he plans to reach using such
>>>>    chips.
>>>
>>>
>>>Depth is _also_  unimportant.  Elsewise they could have just done like Junior
>>>does and report some "new" ply definition of their choosing, and nobody could
>>>refute them at all.
>>>
>>>This was about beating Kasparov.  Not about NPS.  Not about Depth.  Not about
>>>_anything_ but beating Kasparov...
>>>
>>>Had you talked to them after they went to work for IBM you would know this.
>>>Those of use that did, do...
>>>
>>>>
>>>>    However he quotes: "search depth is not so relevant". If it is not
>>>>    so relevant then, why talk about nodes a second then anyway if
>>>>    the usual goal of more nps (getting a bigger search depth) is
>>>>    not considered important.
>>>
>>>They haven't been talking about NPS except in a very vague way.  You have
>>>made it an issue, not them.  They can't really tell you _exactly_ how fast they
>>>are going since they don't count nodes..
>>>
>>>
>>>>
>>>>>EeEk(* DM) kibitzes: kib question from Frantic: According to what was
>>>>>published DB was evaluating 200 million positions per  second (vs  2.5
>>>>>to 5 million for the 8-way Simmons server running Deep Fritz).  How
>>>>>fast would be Beep Blue today if the project continued?
>>>>>CrazyBird(DM) kibitzes: it contains a few reference at the end of the
>>>>>book for the more technically inclined.
>>>>>CrazyBird(DM) kibitzes: if we redo the chip in say, 0.13 micron, and
>>>>>with a improved architecture, it should be possible to do one billion
>>>>>nodes/sec on a single chip.
>>>>>CrazyBird(DM) kibitzes: so a trillion nodes/sec machine is actually
>>>>>possible today.
>>>>>
>>>>>If the cost is not that high maybe Hsu should make ala chessmachine that can be
>>>>>plug into computers (assuming that he has no legal obligation from ibm) The
>>>>>desktop pc is a long way from hiting 1billion nodes/sec. I think most of the
>>>>>professional chessplayers and serious chess hobbyist will buy. He can easily get
>>>>>1 million orders. 1 billion nodes/sec, mmm....:)



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.