Computer Chess Club Archives


Search

Terms

Messages

Subject: rebuilding deep blue - it was too inefficient!!

Author: Vincent Diepeveen

Date: 11:29:07 10/18/02

Go up one level in this thread


On October 17, 2002 at 19:25:11, Robert Hyatt wrote:

Bob, without me wanting to say who is right here:
hsu or you ==> your statements contradicts Hsu's statement.
  ---
  CrazyBird(DM) kibitzes: the 1996 & 1997 version of Deep Blue are differnet
mainly in the amount of chess knowledge.
  aics%
  EeEk(DM) kibitzes: what was the difference?
  aics%
  CrazyBird(DM) kibitzes: we went to Benjamin's excellent chess school.:)
  aics%
  ---
We both know that in theory *everything* can be done in hardware
what can get done in software too. However there is so many
practical issues that you simply don't make it in hardware to
100% the same implement things. Especially the low level at which
Hsu was programming means it was very hard to make the chip. He
did a great achievement by producing the chip.

Being in Hardware has just one advantage and 3 big disadvantages.
In 1997 that is there were 3 disadvantages.

  - it's very expensive (fpga very cheap now)
  - the processor is clocked *way* lower than software processors
    are clocked at (in 1997 the 300Mhz PII was there, versus 20Mhz
    deep blue processors; like factor 15).
  - it's very hard to make a hardware chip

The only advantage is that things can get done in parallel.
That means if everything is sequential, that you then get 15 times
slower than software is in advance (in 1997 15 times, now it's way
way more than that; the technology to produce 15 times slower
processors than the 2.8Ghz P4s which are the latest now,
so 200Mhz processors, that's not exactly cheap still).

And Hsu had just 20Mhz, later managed 'even' 24Mhz. So
every clock you waste to some sequential trying
of hashtable, and other search enhancements, they slow down
the cpu bigtime.

If you implement:
  nullmove
  hashtables
  killermoves
  SEE (qsearch)
  countermove
  butterfly boards
  history heuristics


though i do not believe the last 3 are smart move ordering enhancements
to make, if you implement them you are like 30 clocks slower than
without them.

If you first need 10 clocks on average (which is very little for
0.60 micron) a node, then going to 40 clocks means a slow down
of a factor 3.

That's clearly visible.

I do not know the LATENCY from SRAM. sources who create themselves
processors for a living, they inform me that Deep Blue would have needed
a few megabytes of expensive SRAM (very expensive in 1997, EDO ram
was the standard back then) to not lose too much speed to communicate
with it. EdoRAM is no option for something that is capable of
searching at 2-2.5 MLN nodes a second. Doing over 2 million
probes a second at random locations at EDO ram is not something
i can recommend :)

Now that still isn't as efficient as software, because the probes
get done to local ram to the processor then, which isn't iterating
itself, so it needs a huge overhead anyway when compared to
software. Only if you have some
global big fast parallel ram where each hardware cpu can independantly
get a cache line from, only then you get close to the efficiency
of software!

I didn't calculate them in the 40 clocks, because 40 clocks a node
already would slow down the thing 3 times. Just the sequential trying
of the different heuristics and search enhancements means simply you
lose extra processor clocks as it cannot get done in parallel.

Apart from that, if the design goal is as many nodes a second, which
was a good goal before 1995, then obviously you don't care either for
efficiency!

>On October 17, 2002 at 12:41:59, Vincent Diepeveen wrote:
>
>>On October 16, 2002 at 11:03:33, emerson tan wrote:
>>
>>Nodes a second is not important. I hope you realize that
>>if you create a special program to go as fast as possible,
>>that getting around 40 million nodes a second is easily
>>possible at a dual K7.
>>
>>Do not ask how it plays though or how efficient it searches.
>>
>>Important factors are
>>  - he needs a new very good book. He will not even get
>>    10th at the world championship when his book is from 1997,
>>    and i do not know a single GM in the world who could do the
>>    job for him. You need very special guys in this world to do
>>    a book job. They are unique people, usually with many talents.
>>    Just hiring a GM is not going to be a success in advance.
>>    If you look what time it took for Alterman to contribute something
>>    to the junior team, then you will start crying directly.
>>  - the evaluation needs to get improved bigtime
>>  - To get a billion nodes a second chip he needs around 100 million
>>    dollar. Of course more cpu's doing around 40 MLN nodes a second
>>    at say 500Mhz, he could do with just 10 million dollar.
>>    But if you can afford 10 million dollar for 40MLN nps chips,
>>    you can afford a big parallel machine too. Note that for a single
>>    cpu chip doing about 4 million nodes a second, all he needs is
>>    a cheap 3000 dollar FPGA thing. If you calculate well, then
>>    you will see that deep blue got not so many nodes a second in
>>    chip. it had 480 chips, and deep blue searched around 126 million
>>    nodes a second on average against kasparov. So that's 265k nodes
>>    a second at each chip.
>>
>>    So a single chip getting 4 million nodes a second is very efficient
>>    compared to that.
>>
>>  - He needs more like a trillion nodes a second to compensate for
>>    the inefficiency in hardware. No killermoves. No hashtables etcetera.
>
>
>You keep saying that without knowing what you are talkingabout.  Read his book.
>You will find out that the chess processors _did_ have hash table support.  He
>just
>didn't have time to design and build the memory for them.  Belle was the
>"pattern"
>for deep thought.  It was essentially "belle on a chip".  Belle _did_ have hash
>tables
>in the hardware search...
>
>Given another year (a re-match in 1998) and they would have been hashing in the
>hardware.
>
>Killermoves is not a _huge_ loss.  It is a loss, but not a factor of two or
>anything close
>to that...  I can run the test and post the numbers if you want...
>
>
>>    Of course the argument that it is possible to make hashtables in
>>    hardware is not relevant as there is a price to that which is too
>>    big to pay simply.
>
>Based on what?  Memory is not particularly complex.  It certainly is not
>expensive...
>
>
>>
>>    Even for IBM it was too expensive to pay for
>>    hashtables in hardware, despite that Hsu had created possibilities
>>    for it, the RAM wasn't put on the chips and wasn't connected to the
>>    cpu's. Something that improves the chips of course do get used when
>>    they work somehow. Only price could have been the reason? Don't you
>>    think that too? If not what could be the reason to not use hashtables,
>>    knowing they improve efficiency?
>
>Lack of time.  Hsu completely re-designed the chess chips, got them built,
>tested them, worked around some hardware bugs, suffered thru some fab
>problems that produced bad chips, and so forth.  All in one year.  He got the
>final chips weeks before the Kasparov match.
>
>It was an issue of time.  Memory would have cost _far_ less than the chips
>(chess chips).
>
>
>
>
>
>>
>>    the important thing to remember is that if i want to drive to
>>    Paris with 2 cars and i just ship cars in all directions without
>>    looking on a map or roadboard (representing the inefficiency), then
>>    the chance is they land everywhere except on the highway to Paris.
>>
>>    Even a trillion nodes a second isn't going to work if it is using
>>    inefficient forms of search.
>>
>>    It is not very nice from Hsu to focus upon how many nodes a second
>>    he plans to get. For IBM that was important in 1997 to make marketing
>>    with. It is not a fair comparision.
>
>
>The match was _not_ about NPS.  It was purely about beating Kasparov.  If they
>could have done it with 10 nodes per second, they would have.  I don't know
>where
>you get this NPS fixation you have, but it is wrong.  Just ask Hsu...
>
>
>>
>>    If i go play at world champs 2003 with like 500 processors, i
>>    do not talk about "this program uses up to a terabyte bandwidth
>>    a second (1000000 MB/s) to outpower the other programs, whereas
>>    the poor PC programs only have up to 0.000600 terabyte bandwidth
>>    a second (600MB/s).
>
>
>First, you had better beat them...  That's not going to be easy.  NUMA has
>plenty of problems to overcome...
>
>
>
>
>>
>>    That is not a fair comparision. Do you see why it is not a fair
>>    comparision?
>>
>>    He should say what search depth he plans to reach using such
>>    chips.
>
>
>Depth is _also_  unimportant.  Elsewise they could have just done like Junior
>does and report some "new" ply definition of their choosing, and nobody could
>refute them at all.
>
>This was about beating Kasparov.  Not about NPS.  Not about Depth.  Not about
>_anything_ but beating Kasparov...
>
>Had you talked to them after they went to work for IBM you would know this.
>Those of use that did, do...
>
>>
>>    However he quotes: "search depth is not so relevant". If it is not
>>    so relevant then, why talk about nodes a second then anyway if
>>    the usual goal of more nps (getting a bigger search depth) is
>>    not considered important.
>
>They haven't been talking about NPS except in a very vague way.  You have
>made it an issue, not them.  They can't really tell you _exactly_ how fast they
>are going since they don't count nodes..
>
>
>>
>>>EeEk(* DM) kibitzes: kib question from Frantic: According to what was
>>>published DB was evaluating 200 million positions per  second (vs  2.5
>>>to 5 million for the 8-way Simmons server running Deep Fritz).  How
>>>fast would be Beep Blue today if the project continued?
>>>CrazyBird(DM) kibitzes: it contains a few reference at the end of the
>>>book for the more technically inclined.
>>>CrazyBird(DM) kibitzes: if we redo the chip in say, 0.13 micron, and
>>>with a improved architecture, it should be possible to do one billion
>>>nodes/sec on a single chip.
>>>CrazyBird(DM) kibitzes: so a trillion nodes/sec machine is actually
>>>possible today.
>>>
>>>If the cost is not that high maybe Hsu should make ala chessmachine that can be
>>>plug into computers (assuming that he has no legal obligation from ibm) The
>>>desktop pc is a long way from hiting 1billion nodes/sec. I think most of the
>>>professional chessplayers and serious chess hobbyist will buy. He can easily get
>>>1 million orders. 1 billion nodes/sec, mmm....:)



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.