Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: rebuilding deep blue - it was too inefficient!!

Author: Robert Hyatt

Date: 11:51:28 10/18/02

Go up one level in this thread


On October 18, 2002 at 14:29:07, Vincent Diepeveen wrote:

>On October 17, 2002 at 19:25:11, Robert Hyatt wrote:
>
>Bob, without me wanting to say who is right here:
>hsu or you ==> your statements contradicts Hsu's statement.
>  ---
>  CrazyBird(DM) kibitzes: the 1996 & 1997 version of Deep Blue are differnet
>mainly in the amount of chess knowledge.
>  aics%
>  EeEk(DM) kibitzes: what was the difference?
>  aics%
>  CrazyBird(DM) kibitzes: we went to Benjamin's excellent chess school.:)
>  aics%


What did I say that contradicts that?  Nothing I can think of...

If you mean the re-design, that is a pure fact mentioned many times.  They had
the
original deep thought stuff, then a re-design for deep blue 1, and then another
complete
redesign for deep blue 2.  That's in his book in great detail..




>  ---
>We both know that in theory *everything* can be done in hardware
>what can get done in software too. However there is so many
>practical issues that you simply don't make it in hardware to
>100% the same implement things. Especially the low level at which
>Hsu was programming means it was very hard to make the chip. He
>did a great achievement by producing the chip.
>
>Being in Hardware has just one advantage and 3 big disadvantages.
>In 1997 that is there were 3 disadvantages.
>
>  - it's very expensive (fpga very cheap now)

The Original deep thought chips cost less than $5,000 _total_ for all 16.

The original deep blue 1 chips were also _not_ expensive.  Project MOSIS is
there
just for this kind of stuff...

I don't remember the details about DB2.  I do remember IBM didn't make the chips
themselves...


>  - the processor is clocked *way* lower than software processors
>    are clocked at (in 1997 the 300Mhz PII was there, versus 20Mhz
>    deep blue processors; like factor 15).

So?  The idea in hardware is to do _more_ in a clock cycle.  The clock frequency
is not
an issue, clocks are used to synchronize at various points and let things settle
before they
get latched.  In theory you could build a chip that searches 1M nodes in one
clock cycle.

It would be _much_ harder to do so, however...  and there would be no point
since nobody
cares about the clock frequency, only how fast it searches chess trees...


>  - it's very hard to make a hardware chip

Apparently not to Hsu.  He did a complete chip design, and got it back and ready
to play
in less than one year total, more like nine months if I remember his book
correctly...




>
>The only advantage is that things can get done in parallel.
>That means if everything is sequential, that you then get 15 times
>slower than software is in advance (in 1997 15 times, now it's way
>way more than that; the technology to produce 15 times slower
>processors than the 2.8Ghz P4s which are the latest now,
>so 200Mhz processors, that's not exactly cheap still).
>
>And Hsu had just 20Mhz, later managed 'even' 24Mhz. So
>every clock you waste to some sequential trying
>of hashtable, and other search enhancements, they slow down
>the cpu bigtime.


Not at all.  The hash probe was done in parallel with everything else.  It just
always "failed" since there was no memory present...




>
>If you implement:
>  nullmove
>  hashtables
>  killermoves
>  SEE (qsearch)
>  countermove
>  butterfly boards
>  history heuristics
>
>
>though i do not believe the last 3 are smart move ordering enhancements
>to make, if you implement them you are like 30 clocks slower than
>without them.


That is simply an uninformed statement.  The logic will certainly be far more
complex
if those things are done.  But not necessarily _slower_.  Parallelism is the
name of the game
in ASIC design..


>
>If you first need 10 clocks on average (which is very little for
>0.60 micron) a node, then going to 40 clocks means a slow down
>of a factor 3.

That would be a factor of four.

40 / 10 == 4

>
>That's clearly visible.

But you can wave your hands all you want.  Doesn't mean that 4x slower is
a forced condition...





>
>I do not know the LATENCY from SRAM. sources who create themselves
>processors for a living, they inform me that Deep Blue would have needed
>a few megabytes of expensive SRAM (very expensive in 1997, EDO ram
>was the standard back then) to not lose too much speed to communicate
>with it. EdoRAM is no option for something that is capable of
>searching at 2-2.5 MLN nodes a second. Doing over 2 million
>probes a second at random locations at EDO ram is not something
>i can recommend :)

Do the math.  EDO ram has 100ns cycle time.  Deep Blue chess processors had a
50ns cycle time.  Overlap the memory read with two early cycles and it is
free...





>
>Now that still isn't as efficient as software, because the probes
>get done to local ram to the processor then, which isn't iterating
>itself, so it needs a huge overhead anyway when compared to
>software. Only if you have some
>global big fast parallel ram where each hardware cpu can independantly
>get a cache line from, only then you get close to the efficiency
>of software!

The RAM design of the new DB chips supported a 16-way shared RAM between
the processors on a single SP node.  Not much way to do a shared hash table with
30 different nodes.  480-port memory would be impossibly complex and expensive.


>
>I didn't calculate them in the 40 clocks, because 40 clocks a node
>already would slow down the thing 3 times. Just the sequential trying
>of the different heuristics and search enhancements means simply you
>lose extra processor clocks as it cannot get done in parallel.
>


Doesn't matter.  See above.  two chess chip clock cycles would be all that is
needed to
read from plain old DRAM.  Using SRAM would cut it to under 1 cycle.





>Apart from that, if the design goal is as many nodes a second, which
>was a good goal before 1995, then obviously you don't care either for
>efficiency!


That is another false statement.  Their "design goal" was _only_ to beat
Kasparov.
NPS or depth was _not_ the driving factor...



>
>>On October 17, 2002 at 12:41:59, Vincent Diepeveen wrote:
>>
>>>On October 16, 2002 at 11:03:33, emerson tan wrote:
>>>
>>>Nodes a second is not important. I hope you realize that
>>>if you create a special program to go as fast as possible,
>>>that getting around 40 million nodes a second is easily
>>>possible at a dual K7.
>>>
>>>Do not ask how it plays though or how efficient it searches.
>>>
>>>Important factors are
>>>  - he needs a new very good book. He will not even get
>>>    10th at the world championship when his book is from 1997,
>>>    and i do not know a single GM in the world who could do the
>>>    job for him. You need very special guys in this world to do
>>>    a book job. They are unique people, usually with many talents.
>>>    Just hiring a GM is not going to be a success in advance.
>>>    If you look what time it took for Alterman to contribute something
>>>    to the junior team, then you will start crying directly.
>>>  - the evaluation needs to get improved bigtime
>>>  - To get a billion nodes a second chip he needs around 100 million
>>>    dollar. Of course more cpu's doing around 40 MLN nodes a second
>>>    at say 500Mhz, he could do with just 10 million dollar.
>>>    But if you can afford 10 million dollar for 40MLN nps chips,
>>>    you can afford a big parallel machine too. Note that for a single
>>>    cpu chip doing about 4 million nodes a second, all he needs is
>>>    a cheap 3000 dollar FPGA thing. If you calculate well, then
>>>    you will see that deep blue got not so many nodes a second in
>>>    chip. it had 480 chips, and deep blue searched around 126 million
>>>    nodes a second on average against kasparov. So that's 265k nodes
>>>    a second at each chip.
>>>
>>>    So a single chip getting 4 million nodes a second is very efficient
>>>    compared to that.
>>>
>>>  - He needs more like a trillion nodes a second to compensate for
>>>    the inefficiency in hardware. No killermoves. No hashtables etcetera.
>>
>>
>>You keep saying that without knowing what you are talkingabout.  Read his book.
>>You will find out that the chess processors _did_ have hash table support.  He
>>just
>>didn't have time to design and build the memory for them.  Belle was the
>>"pattern"
>>for deep thought.  It was essentially "belle on a chip".  Belle _did_ have hash
>>tables
>>in the hardware search...
>>
>>Given another year (a re-match in 1998) and they would have been hashing in the
>>hardware.
>>
>>Killermoves is not a _huge_ loss.  It is a loss, but not a factor of two or
>>anything close
>>to that...  I can run the test and post the numbers if you want...
>>
>>
>>>    Of course the argument that it is possible to make hashtables in
>>>    hardware is not relevant as there is a price to that which is too
>>>    big to pay simply.
>>
>>Based on what?  Memory is not particularly complex.  It certainly is not
>>expensive...
>>
>>
>>>
>>>    Even for IBM it was too expensive to pay for
>>>    hashtables in hardware, despite that Hsu had created possibilities
>>>    for it, the RAM wasn't put on the chips and wasn't connected to the
>>>    cpu's. Something that improves the chips of course do get used when
>>>    they work somehow. Only price could have been the reason? Don't you
>>>    think that too? If not what could be the reason to not use hashtables,
>>>    knowing they improve efficiency?
>>
>>Lack of time.  Hsu completely re-designed the chess chips, got them built,
>>tested them, worked around some hardware bugs, suffered thru some fab
>>problems that produced bad chips, and so forth.  All in one year.  He got the
>>final chips weeks before the Kasparov match.
>>
>>It was an issue of time.  Memory would have cost _far_ less than the chips
>>(chess chips).
>>
>>
>>
>>
>>
>>>
>>>    the important thing to remember is that if i want to drive to
>>>    Paris with 2 cars and i just ship cars in all directions without
>>>    looking on a map or roadboard (representing the inefficiency), then
>>>    the chance is they land everywhere except on the highway to Paris.
>>>
>>>    Even a trillion nodes a second isn't going to work if it is using
>>>    inefficient forms of search.
>>>
>>>    It is not very nice from Hsu to focus upon how many nodes a second
>>>    he plans to get. For IBM that was important in 1997 to make marketing
>>>    with. It is not a fair comparision.
>>
>>
>>The match was _not_ about NPS.  It was purely about beating Kasparov.  If they
>>could have done it with 10 nodes per second, they would have.  I don't know
>>where
>>you get this NPS fixation you have, but it is wrong.  Just ask Hsu...
>>
>>
>>>
>>>    If i go play at world champs 2003 with like 500 processors, i
>>>    do not talk about "this program uses up to a terabyte bandwidth
>>>    a second (1000000 MB/s) to outpower the other programs, whereas
>>>    the poor PC programs only have up to 0.000600 terabyte bandwidth
>>>    a second (600MB/s).
>>
>>
>>First, you had better beat them...  That's not going to be easy.  NUMA has
>>plenty of problems to overcome...
>>
>>
>>
>>
>>>
>>>    That is not a fair comparision. Do you see why it is not a fair
>>>    comparision?
>>>
>>>    He should say what search depth he plans to reach using such
>>>    chips.
>>
>>
>>Depth is _also_  unimportant.  Elsewise they could have just done like Junior
>>does and report some "new" ply definition of their choosing, and nobody could
>>refute them at all.
>>
>>This was about beating Kasparov.  Not about NPS.  Not about Depth.  Not about
>>_anything_ but beating Kasparov...
>>
>>Had you talked to them after they went to work for IBM you would know this.
>>Those of use that did, do...
>>
>>>
>>>    However he quotes: "search depth is not so relevant". If it is not
>>>    so relevant then, why talk about nodes a second then anyway if
>>>    the usual goal of more nps (getting a bigger search depth) is
>>>    not considered important.
>>
>>They haven't been talking about NPS except in a very vague way.  You have
>>made it an issue, not them.  They can't really tell you _exactly_ how fast they
>>are going since they don't count nodes..
>>
>>
>>>
>>>>EeEk(* DM) kibitzes: kib question from Frantic: According to what was
>>>>published DB was evaluating 200 million positions per  second (vs  2.5
>>>>to 5 million for the 8-way Simmons server running Deep Fritz).  How
>>>>fast would be Beep Blue today if the project continued?
>>>>CrazyBird(DM) kibitzes: it contains a few reference at the end of the
>>>>book for the more technically inclined.
>>>>CrazyBird(DM) kibitzes: if we redo the chip in say, 0.13 micron, and
>>>>with a improved architecture, it should be possible to do one billion
>>>>nodes/sec on a single chip.
>>>>CrazyBird(DM) kibitzes: so a trillion nodes/sec machine is actually
>>>>possible today.
>>>>
>>>>If the cost is not that high maybe Hsu should make ala chessmachine that can be
>>>>plug into computers (assuming that he has no legal obligation from ibm) The
>>>>desktop pc is a long way from hiting 1billion nodes/sec. I think most of the
>>>>professional chessplayers and serious chess hobbyist will buy. He can easily get
>>>>1 million orders. 1 billion nodes/sec, mmm....:)



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.