Computer Chess Club Archives

Search

Terms
Messages

Subject: Re: rebuilding deep blue - it was too inefficient!!

Author: Robert Hyatt
Date: 10:29:25 10/21/02
On October 18, 2002 at 14:58:39, Vincent Diepeveen wrote:

>On October 18, 2002 at 14:51:28, Robert Hyatt wrote:
>
>guys on university are bad in counting money.
>the deep thought project was a project paid
>by the us government ==> university
>if i remember well.
>
>when ibm attached its name to it it was very expensive!
>



>of course 0.60 micron is not so impressive by now.
>
>DeepThroats within IBM say around 30000$ each cpu of DBII was
>paid by IBM.


If you are talking about the SP2 that is about right for the cost per node.

If you are talking about the chess chips themselves, that is _way_ over
what they paid.



>
>480 x 30k = 14.4 million dollar paid by IBM.
>

Didn't cost anywhere near that.  I am not sure if Hsu's book specifically gives
a
dollar amount, I will have to look.  But it certainly wasn't $30K.  It seems to
me that
somewhere along the way IBM claimed to have spent about $10,000,000.00 on the
entire project, which includes the salaries of the people involved, including
paying the
GM advisors, Hsu/Campbell/Hoane and so forth.






>Note also a bunch of test cpu's might have been pressed.
>all of them 30k a piece.


Didn't happen...



>
>>On October 18, 2002 at 14:29:07, Vincent Diepeveen wrote:
>>
>>>On October 17, 2002 at 19:25:11, Robert Hyatt wrote:
>>>
>>>Bob, without me wanting to say who is right here:
>>>hsu or you ==> your statements contradicts Hsu's statement.
>>>  ---
>>>  CrazyBird(DM) kibitzes: the 1996 & 1997 version of Deep Blue are differnet
>>>mainly in the amount of chess knowledge.
>>>  aics%
>>>  EeEk(DM) kibitzes: what was the difference?
>>>  aics%
>>>  CrazyBird(DM) kibitzes: we went to Benjamin's excellent chess school.:)
>>>  aics%
>>
>>
>>What did I say that contradicts that?  Nothing I can think of...
>>
>>If you mean the re-design, that is a pure fact mentioned many times.  They had
>>the
>>original deep thought stuff, then a re-design for deep blue 1, and then another
>>complete
>>redesign for deep blue 2.  That's in his book in great detail..
>>
>>
>>
>>
>>>  ---
>>>We both know that in theory *everything* can be done in hardware
>>>what can get done in software too. However there is so many
>>>practical issues that you simply don't make it in hardware to
>>>100% the same implement things. Especially the low level at which
>>>Hsu was programming means it was very hard to make the chip. He
>>>did a great achievement by producing the chip.
>>>
>>>Being in Hardware has just one advantage and 3 big disadvantages.
>>>In 1997 that is there were 3 disadvantages.
>>>
>>>  - it's very expensive (fpga very cheap now)
>>
>>The Original deep thought chips cost less than $5,000 _total_ for all 16.
>>
>>The original deep blue 1 chips were also _not_ expensive.  Project MOSIS is
>>there
>>just for this kind of stuff...
>>
>>I don't remember the details about DB2.  I do remember IBM didn't make the chips
>>themselves...
>>
>>
>>>  - the processor is clocked *way* lower than software processors
>>>    are clocked at (in 1997 the 300Mhz PII was there, versus 20Mhz
>>>    deep blue processors; like factor 15).
>>
>>So?  The idea in hardware is to do _more_ in a clock cycle.  The clock frequency
>>is not
>>an issue, clocks are used to synchronize at various points and let things settle
>>before they
>>get latched.  In theory you could build a chip that searches 1M nodes in one
>>clock cycle.
>>
>>It would be _much_ harder to do so, however...  and there would be no point
>>since nobody
>>cares about the clock frequency, only how fast it searches chess trees...
>>
>>
>>>  - it's very hard to make a hardware chip
>>
>>Apparently not to Hsu.  He did a complete chip design, and got it back and ready
>>to play
>>in less than one year total, more like nine months if I remember his book
>>correctly...
>>
>>
>>
>>
>>>
>>>The only advantage is that things can get done in parallel.
>>>That means if everything is sequential, that you then get 15 times
>>>slower than software is in advance (in 1997 15 times, now it's way
>>>way more than that; the technology to produce 15 times slower
>>>processors than the 2.8Ghz P4s which are the latest now,
>>>so 200Mhz processors, that's not exactly cheap still).
>>>
>>>And Hsu had just 20Mhz, later managed 'even' 24Mhz. So
>>>every clock you waste to some sequential trying
>>>of hashtable, and other search enhancements, they slow down
>>>the cpu bigtime.
>>
>>
>>Not at all.  The hash probe was done in parallel with everything else.  It just
>>always "failed" since there was no memory present...
>>
>>
>>
>>
>>>
>>>If you implement:
>>>  nullmove
>>>  hashtables
>>>  killermoves
>>>  SEE (qsearch)
>>>  countermove
>>>  butterfly boards
>>>  history heuristics
>>>
>>>
>>>though i do not believe the last 3 are smart move ordering enhancements
>>>to make, if you implement them you are like 30 clocks slower than
>>>without them.
>>
>>
>>That is simply an uninformed statement.  The logic will certainly be far more
>>complex
>>if those things are done.  But not necessarily _slower_.  Parallelism is the
>>name of the game
>>in ASIC design..
>>
>>
>>>
>>>If you first need 10 clocks on average (which is very little for
>>>0.60 micron) a node, then going to 40 clocks means a slow down
>>>of a factor 3.
>>
>>That would be a factor of four.
>>
>>40 / 10 == 4
>>
>>>
>>>That's clearly visible.
>>
>>But you can wave your hands all you want.  Doesn't mean that 4x slower is
>>a forced condition...
>>
>>
>>
>>
>>
>>>
>>>I do not know the LATENCY from SRAM. sources who create themselves
>>>processors for a living, they inform me that Deep Blue would have needed
>>>a few megabytes of expensive SRAM (very expensive in 1997, EDO ram
>>>was the standard back then) to not lose too much speed to communicate
>>>with it. EdoRAM is no option for something that is capable of
>>>searching at 2-2.5 MLN nodes a second. Doing over 2 million
>>>probes a second at random locations at EDO ram is not something
>>>i can recommend :)
>>
>>Do the math.  EDO ram has 100ns cycle time.  Deep Blue chess processors had a
>>50ns cycle time.  Overlap the memory read with two early cycles and it is
>>free...
>>
>>
>>
>>
>>
>>>
>>>Now that still isn't as efficient as software, because the probes
>>>get done to local ram to the processor then, which isn't iterating
>>>itself, so it needs a huge overhead anyway when compared to
>>>software. Only if you have some
>>>global big fast parallel ram where each hardware cpu can independantly
>>>get a cache line from, only then you get close to the efficiency
>>>of software!
>>
>>The RAM design of the new DB chips supported a 16-way shared RAM between
>>the processors on a single SP node.  Not much way to do a shared hash table with
>>30 different nodes.  480-port memory would be impossibly complex and expensive.
>>
>>
>>>
>>>I didn't calculate them in the 40 clocks, because 40 clocks a node
>>>already would slow down the thing 3 times. Just the sequential trying
>>>of the different heuristics and search enhancements means simply you
>>>lose extra processor clocks as it cannot get done in parallel.
>>>
>>
>>
>>Doesn't matter.  See above.  two chess chip clock cycles would be all that is
>>needed to
>>read from plain old DRAM.  Using SRAM would cut it to under 1 cycle.
>>
>>
>>
>>
>>
>>>Apart from that, if the design goal is as many nodes a second, which
>>>was a good goal before 1995, then obviously you don't care either for
>>>efficiency!
>>
>>
>>That is another false statement.  Their "design goal" was _only_ to beat
>>Kasparov.
>>NPS or depth was _not_ the driving factor...
>>
>>
>>
>>>
>>>>On October 17, 2002 at 12:41:59, Vincent Diepeveen wrote:
>>>>
>>>>>On October 16, 2002 at 11:03:33, emerson tan wrote:
>>>>>
>>>>>Nodes a second is not important. I hope you realize that
>>>>>if you create a special program to go as fast as possible,
>>>>>that getting around 40 million nodes a second is easily
>>>>>possible at a dual K7.
>>>>>
>>>>>Do not ask how it plays though or how efficient it searches.
>>>>>
>>>>>Important factors are
>>>>>  - he needs a new very good book. He will not even get
>>>>>    10th at the world championship when his book is from 1997,
>>>>>    and i do not know a single GM in the world who could do the
>>>>>    job for him. You need very special guys in this world to do
>>>>>    a book job. They are unique people, usually with many talents.
>>>>>    Just hiring a GM is not going to be a success in advance.
>>>>>    If you look what time it took for Alterman to contribute something
>>>>>    to the junior team, then you will start crying directly.
>>>>>  - the evaluation needs to get improved bigtime
>>>>>  - To get a billion nodes a second chip he needs around 100 million
>>>>>    dollar. Of course more cpu's doing around 40 MLN nodes a second
>>>>>    at say 500Mhz, he could do with just 10 million dollar.
>>>>>    But if you can afford 10 million dollar for 40MLN nps chips,
>>>>>    you can afford a big parallel machine too. Note that for a single
>>>>>    cpu chip doing about 4 million nodes a second, all he needs is
>>>>>    a cheap 3000 dollar FPGA thing. If you calculate well, then
>>>>>    you will see that deep blue got not so many nodes a second in
>>>>>    chip. it had 480 chips, and deep blue searched around 126 million
>>>>>    nodes a second on average against kasparov. So that's 265k nodes
>>>>>    a second at each chip.
>>>>>
>>>>>    So a single chip getting 4 million nodes a second is very efficient
>>>>>    compared to that.
>>>>>
>>>>>  - He needs more like a trillion nodes a second to compensate for
>>>>>    the inefficiency in hardware. No killermoves. No hashtables etcetera.
>>>>
>>>>
>>>>You keep saying that without knowing what you are talkingabout.  Read his book.
>>>>You will find out that the chess processors _did_ have hash table support.  He
>>>>just
>>>>didn't have time to design and build the memory for them.  Belle was the
>>>>"pattern"
>>>>for deep thought.  It was essentially "belle on a chip".  Belle _did_ have hash
>>>>tables
>>>>in the hardware search...
>>>>
>>>>Given another year (a re-match in 1998) and they would have been hashing in the
>>>>hardware.
>>>>
>>>>Killermoves is not a _huge_ loss.  It is a loss, but not a factor of two or
>>>>anything close
>>>>to that...  I can run the test and post the numbers if you want...
>>>>
>>>>
>>>>>    Of course the argument that it is possible to make hashtables in
>>>>>    hardware is not relevant as there is a price to that which is too
>>>>>    big to pay simply.
>>>>
>>>>Based on what?  Memory is not particularly complex.  It certainly is not
>>>>expensive...
>>>>
>>>>
>>>>>
>>>>>    Even for IBM it was too expensive to pay for
>>>>>    hashtables in hardware, despite that Hsu had created possibilities
>>>>>    for it, the RAM wasn't put on the chips and wasn't connected to the
>>>>>    cpu's. Something that improves the chips of course do get used when
>>>>>    they work somehow. Only price could have been the reason? Don't you
>>>>>    think that too? If not what could be the reason to not use hashtables,
>>>>>    knowing they improve efficiency?
>>>>
>>>>Lack of time.  Hsu completely re-designed the chess chips, got them built,
>>>>tested them, worked around some hardware bugs, suffered thru some fab
>>>>problems that produced bad chips, and so forth.  All in one year.  He got the
>>>>final chips weeks before the Kasparov match.
>>>>
>>>>It was an issue of time.  Memory would have cost _far_ less than the chips
>>>>(chess chips).
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>>
>>>>>    the important thing to remember is that if i want to drive to
>>>>>    Paris with 2 cars and i just ship cars in all directions without
>>>>>    looking on a map or roadboard (representing the inefficiency), then
>>>>>    the chance is they land everywhere except on the highway to Paris.
>>>>>
>>>>>    Even a trillion nodes a second isn't going to work if it is using
>>>>>    inefficient forms of search.
>>>>>
>>>>>    It is not very nice from Hsu to focus upon how many nodes a second
>>>>>    he plans to get. For IBM that was important in 1997 to make marketing
>>>>>    with. It is not a fair comparision.
>>>>
>>>>
>>>>The match was _not_ about NPS.  It was purely about beating Kasparov.  If they
>>>>could have done it with 10 nodes per second, they would have.  I don't know
>>>>where
>>>>you get this NPS fixation you have, but it is wrong.  Just ask Hsu...
>>>>
>>>>
>>>>>
>>>>>    If i go play at world champs 2003 with like 500 processors, i
>>>>>    do not talk about "this program uses up to a terabyte bandwidth
>>>>>    a second (1000000 MB/s) to outpower the other programs, whereas
>>>>>    the poor PC programs only have up to 0.000600 terabyte bandwidth
>>>>>    a second (600MB/s).
>>>>
>>>>
>>>>First, you had better beat them...  That's not going to be easy.  NUMA has
>>>>plenty of problems to overcome...
>>>>
>>>>
>>>>
>>>>
>>>>>
>>>>>    That is not a fair comparision. Do you see why it is not a fair
>>>>>    comparision?
>>>>>
>>>>>    He should say what search depth he plans to reach using such
>>>>>    chips.
>>>>
>>>>
>>>>Depth is _also_  unimportant.  Elsewise they could have just done like Junior
>>>>does and report some "new" ply definition of their choosing, and nobody could
>>>>refute them at all.
>>>>
>>>>This was about beating Kasparov.  Not about NPS.  Not about Depth.  Not about
>>>>_anything_ but beating Kasparov...
>>>>
>>>>Had you talked to them after they went to work for IBM you would know this.
>>>>Those of use that did, do...
>>>>
>>>>>
>>>>>    However he quotes: "search depth is not so relevant". If it is not
>>>>>    so relevant then, why talk about nodes a second then anyway if
>>>>>    the usual goal of more nps (getting a bigger search depth) is
>>>>>    not considered important.
>>>>
>>>>They haven't been talking about NPS except in a very vague way.  You have
>>>>made it an issue, not them.  They can't really tell you _exactly_ how fast they
>>>>are going since they don't count nodes..
>>>>
>>>>
>>>>>
>>>>>>EeEk(* DM) kibitzes: kib question from Frantic: According to what was
>>>>>>published DB was evaluating 200 million positions per  second (vs  2.5
>>>>>>to 5 million for the 8-way Simmons server running Deep Fritz).  How
>>>>>>fast would be Beep Blue today if the project continued?
>>>>>>CrazyBird(DM) kibitzes: it contains a few reference at the end of the
>>>>>>book for the more technically inclined.
>>>>>>CrazyBird(DM) kibitzes: if we redo the chip in say, 0.13 micron, and
>>>>>>with a improved architecture, it should be possible to do one billion
>>>>>>nodes/sec on a single chip.
>>>>>>CrazyBird(DM) kibitzes: so a trillion nodes/sec machine is actually
>>>>>>possible today.
>>>>>>
>>>>>>If the cost is not that high maybe Hsu should make ala chessmachine that can be
>>>>>>plug into computers (assuming that he has no legal obligation from ibm) The
>>>>>>desktop pc is a long way from hiting 1billion nodes/sec. I think most of the
>>>>>>professional chessplayers and serious chess hobbyist will buy. He can easily get
>>>>>>1 million orders. 1 billion nodes/sec, mmm....:)
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.