Computer Chess Club Archives

Search

Terms
Messages

Subject: Re: DEEP BLUES AVERAGE PLY?

Author: Robert Hyatt
Date: 08:22:08 08/25/02
On August 25, 2002 at 01:40:30, Uri Blass wrote:

>On August 25, 2002 at 00:41:55, Robert Hyatt wrote:
>
>>On August 24, 2002 at 18:31:07, Uri Blass wrote:
>>
>>>On August 24, 2002 at 17:59:43, Robert Hyatt wrote:
>>>
>>>>On August 23, 2002 at 12:24:45, Vincent Diepeveen wrote:
>>>>
>>>>>On August 22, 2002 at 20:25:24, Robert Hyatt wrote:
>>>>>
>>>>>>On August 22, 2002 at 18:22:56, Uri Blass wrote:
>>>>>>
>>>>>>>On August 22, 2002 at 18:01:09, Robert Hyatt wrote:
>>>>>>>
>>>>>>>>On August 21, 2002 at 20:10:26, Mike S. wrote:
>>>>>>>>
>>>>>>>>>On August 21, 2002 at 11:07:58, Robert Hyatt wrote:
>>>>>>>>>
>>>>>>>>>>(...)
>>>>>>>>>>1.  They reported depth as 11(6) for example.  According to the deep blue
>>>>>>>>>>team, and regardless of what others will say about it, this supposedly means
>>>>>>>>>>that they did 11 plies in software, plus another 6 in hardware.
>>>>>>>>>
>>>>>>>>>When I looked at some of the logs, I had the impression that "11(6)" was
>>>>>>>>>reported most often, IOW. we can probably say that it was the *typical* search
>>>>>>>>>depth reported (except additional extension depths we do not know), in the
>>>>>>>>>middlegame, 1997. Would you agree with that from your study of the logs?
>>>>>>>>>
>>>>>>>>
>>>>>>>>I thought so.  But since the paper quotes 12.2, that would mandate that 12
>>>>>>>>must come up more often that 11.  I haven't gone thru each log in that kind
>>>>>>>>of detail as that is a recipe for a headache.  :)
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>>Another thing I'm not sure of is: *When* could relatively safely be claimed,
>>>>>>>>>that DB.'s depth is reached again:
>>>>>>>>>
>>>>>>>>>a) when a current prog reaches at least 16 plies as a typical middlegame depth,
>>>>>>>>>   because some search techniques used now (which DB. didn't use), make up for
>>>>>>>>>   the missing ply (at least), or
>>>>>>>>>b) when 17 plies are reached, not earlier, or
>>>>>>>>>c) a program would have to reach more than 17 plies, because DB used much more
>>>>>>>>>   knowledge which current software probably does not yet use to that extent.
>>>>>>>>>
>>>>>>>>>I search for expert's opinions of *when* we can say something like "Yes, now
>>>>>>>>>with this specific performance [## plies etc.] we can safely say - as it's our
>>>>>>>>>*best guess*, since no direct head-to-head match is possible - that this new
>>>>>>>>>chess computer is better than Deep Blue was."
>>>>>>>>
>>>>>>>>I don't see any real way to do this.  IE take the following types of
>>>>>>>>programs and try to compare depths:
>>>>>>>>
>>>>>>>>1.  Junior, which uses a different definition of ply than everyone else.
>>>>>>>>They appear to search _much_ deeper than anyone else, based only on this,
>>>>>>>>but Amir has explained how he counts plies, and the bottom line is that
>>>>>>>>raw ply depth can't be compared.
>>>>>>>>
>>>>>>>>2.  Very dumb and fast program, with no q-search to speak of.  Since the
>>>>>>>>q-search is at _least_ 50% of the total tree search space, lopping that off
>>>>>>>>gets more depth.  But how to compare 14 plies with no q-search to 12 plies
>>>>>>>>with q-search?
>>>>>>>>
>>>>>>>>3.  lots of selective search extensions.  This program might only search
>>>>>>>>9 plies deep on average, but it extends the _right_ moves at the right times,
>>>>>>>>so that even though it is only searching 9 plies deep, it beats the "22-ply
>>>>>>>>searching Junior program" handily.
>>>>>>>>
>>>>>>>>4.  Lots of other variations.  The bottom line is that depth is not an easy
>>>>>>>>way to compare programs.  Neither is NPS.  Unless you see some _real_ depth
>>>>>>>>that is way beyond everyone.  Or some real NPS that is way beyond everyone.
>>>>>>>>
>>>>>>>>For example, we have had a couple of very fast/dumb programs compete over
>>>>>>>>the years, and they have managed to do very well, because their speed and
>>>>>>>>tactics overcame their lack of positional understanding, when playing the
>>>>>>>>opponents they drew in the ACM/WCCC events.  We have also seen very slow
>>>>>>>>programs out-play everyone.  But we are talking about programs that are
>>>>>>>>generally within an order of magnitude of each other.  Say 20K nodes per
>>>>>>>>second to 200K nodes per second.  If someone suddenly hits the scene going
>>>>>>>>200M nodes per second, then that is a serious number if it is real...  So
>>>>>>>>even though I generally say that comparing NPS is a bad idea unless you are
>>>>>>>>using the _same_ program, there are logical exceptions...
>>>>>>>>
>>>>>>>>>
>>>>>>>>>But the claim should be illustrated by somewhat convincing figures (node rate is
>>>>>>>>>not convincing enough IMO, although still impressive). Maybe the ply depth is; I
>>>>>>>>>know it's also no perfect comparison though. But we don't have anything better
>>>>>>>>>probably. A few positons/moves to compare are not enough.
>>>>>>>>
>>>>>>>>I think you have to look at results above all else.  IE for IBM, deep thought
>>>>>>>>totally dominated computer chess for 10 years, losing one well-known game.  That
>>>>>>>>is tough to do if you are not far better than everyone else.  Since their last
>>>>>>>>computer event in 1995, suddenly they started going 100X faster.  So they have
>>>>>>>>a significant boost there, unless you do as some do and conclude that the
>>>>>>>>extra speed means nothing.
>>>>>>>
>>>>>>>I conclude that it was not 100 times fasters.
>>>>>>>
>>>>>>>1)200M nodes is wrong based on the paper of Hsu.
>>>>>>>2)They suffered from lack of efficiency because they prefered
>>>>>>>to improve the evaluation and not to fix
>>>>>>>the efficiency problems.
>>>>>>>
>>>>>>>I will not be surprised if their nodes were eqvivalent only
>>>>>>>to 20M on a single PC that is also very good achievement.
>>>>>>>
>>>>>>>I also believe that they were better than the programs
>>>>>>>of 1997 even if you use the hardware of today.
>>>>>>>
>>>>>>>Uri
>>>>>>
>>>>>>
>>>>>>I don't believe they were only equivalent to 20M nodes.  Simply because I
>>>>>>know how strong deep thought was from first-hand experience.  But I don't
>>>>>>have access to the machine to do the same kind of testing I can do with
>>>>>>Crafty.  I _know_ how much faster I run on my quad than I do on a single
>>>>>>cpu.  And _anybody_ can measure that if they have a quad handy since the
>>>>>>source for crafty is available.
>>>>>>
>>>>>>Unfortunately, we don't have that luxury with DB2.  But I find it very
>>>>>>difficult to believe that it was only a 20M machine effectively...
>>>>>>particularly considering that Hsu said more than once that he was driving
>>>>>>the chess processors at 70% duty cycle...
>>>>>
>>>>>If you look in the paper their reported speedups were extrapolated.
>>>>>So they measured what 1 cpu did and compared with a few processors,
>>>>>then used that number for 480 processors instead of measuring 480.
>>>>
>>>>Vincent, this is something to do with _that_ paper.  IE it should be
>>>>pretty obvious why they had to extrapolate at all.  All they have is
>>>>DB Jr to work with.
>>>>
>>>>Hsu did _lots_ of testing on the real DB machines when he had time.  And
>>>>he did _real_ speedup testing just like we do.  Don't confuse what was
>>>>in _that_ paper and assume that is _all_ they did.  It wasn't...
>>>>
>>>>I've seen some speedup stuff for DB1 in fact.  I saw a couple of test
>>>>positions where DB1 ran about 25 times faster with 200+ processors than
>>>>it did with just one.  I saw a couple of others where it was more like
>>>>50...  That isn't great, but it is _not_ "bad".  He gave me a number
>>>>of 30% way back, which I have quoted before.  IE with 200 processors
>>>>he said that 30% of that was a good estimate...  That was a number he
>>>>also mentioned in his dissertation...
>>>
>>>I do not know what he told you but I read an esimate for efficiency of 8-12% for
>>>Deeper Blue.
>>
>>
>>I think I posted that sometime back.  The 30% was (as best I recall) for deep
>>blue 1.  DB2 had a much different hardware search, where they added singular
>>extensions among other things.  That would likely break the search load
>>balancing to an extent.
>>
>>But even if it is 10%, that is not bad.  Nor 12%.  IE the gross peak speed
>>was > one billion nodes per second. 12% would be 120M nodes per second, and
>>that includes the SMP loss.  120M is a terrific speed to search.  For example,
>>my 1.5M on the quad is more like 1.1M due to smp search overhead.
>>
>>I think we have lots of different numbers being mixed up carelessly by the
>>DB team.
>>
>>1.  NPS.  The raw NPS searched in a game, such as the number reported by Crafty,
>>and the deep * programs.  Unfortunately, these are inflated numbers because many
>>of those nodes would not be searched by a serial search and are therefore
>>worthless.
>>
>>2.  NPS.  The effective NPS searched in a game.  For example, I could easily
>>calculate this for Crafty as follows:  Do a search to some target depth and
>>display the nodes needed.  Then use 4 processors to see how long it now takes
>>to get to that same target depth.  Say the answer for position X is 2.0.  Then
>>the NPS for this position should really be displayed as 2.0 * serial NPS since
>>the search is effectively twice as fast.  For position Y, it is 3.1X faster,
>>so the NPS should be 3.1 * serial NPS for this position.
>>
>>In many cases, Hsu used number 2.  While the rest of us are blindly using
>>#1 because it is far easier to calculate this in a chess engine, since we
>>really have no idea how long a serial search would take without re-running the
>>game with one cpu to find out.  So we display NPS #1 and go about our business.
>>
>>HSU doesn't know the _real_ nps like we do, because the chess processors can't
>>count then.  I asked Ken why once and he said (for Belle, which was the
>>predecessor of a single DB chip) "it takes as long to count a node as it does
>>to search it..."
>
>I do not understand it.
>Counting nodes is clearly easier than searching them and I see no reason that it
>is going to be different for deeper blue.

You don't understand it because you aren't a hardware designer.  And that is
perfectly ok.

How long does it take a traditional microprocessor to execute an add operation?
That involves a fetch from memory, an increment, and a memory store.  Deep Blue
(and Belle and Bebe and other hardware machines) can search a node in that same
length of time...  because it is being done in hardware.

That is the issue.

Searching a node doesn't require a memory access for starters.  Nor does it
require an adder tree to add a 64 bit number...  that takes hardware space. and
time.




>
>If the hardware can give information to the software about search I do not see
>what is the problem also to give information about number of nodes.

All the hardware gives to the software is the best move and the score.  Passing
back a node count would add another cycle and cost at least 50% overhead in
doing so.

Remember that I have mentioned before that DB chess chips can't give a PV.
That is the reason why.


>
>I think that not knowing the number of nodes is a mistake because the number of
>nodes can be used for better order of moves or for better extension rules.

Not in hardware.  You are stuck with what you have...  When you _simulate_
the chip, yes.  But when you commit it to silicon, it is too late to think
about improvements then.

>
>  SO to figure out NPS, they have to do a lot of work to measure
>>how long each chess processor is busy, multiply that by the known hardware
>>speed, and then sum that for all chess processors.  Hsu frequently used a
>>"real (#2) NPS" because it was more honest.
>
>I think that it is better simply to give both numbers to avoid confusion.
>If somebody writes Nps and not effective Nps then I think that the meaning is
>simply the number of nodes.

I agree.  Hsu started this with his dissertation.  And it was confusing.

>
>Uri
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.