Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: DEEP BLUES AVERAGE PLY?

Author: Uri Blass

Date: 15:31:07 08/24/02

Go up one level in this thread


On August 24, 2002 at 17:59:43, Robert Hyatt wrote:

>On August 23, 2002 at 12:24:45, Vincent Diepeveen wrote:
>
>>On August 22, 2002 at 20:25:24, Robert Hyatt wrote:
>>
>>>On August 22, 2002 at 18:22:56, Uri Blass wrote:
>>>
>>>>On August 22, 2002 at 18:01:09, Robert Hyatt wrote:
>>>>
>>>>>On August 21, 2002 at 20:10:26, Mike S. wrote:
>>>>>
>>>>>>On August 21, 2002 at 11:07:58, Robert Hyatt wrote:
>>>>>>
>>>>>>>(...)
>>>>>>>1.  They reported depth as 11(6) for example.  According to the deep blue
>>>>>>>team, and regardless of what others will say about it, this supposedly means
>>>>>>>that they did 11 plies in software, plus another 6 in hardware.
>>>>>>
>>>>>>When I looked at some of the logs, I had the impression that "11(6)" was
>>>>>>reported most often, IOW. we can probably say that it was the *typical* search
>>>>>>depth reported (except additional extension depths we do not know), in the
>>>>>>middlegame, 1997. Would you agree with that from your study of the logs?
>>>>>>
>>>>>
>>>>>I thought so.  But since the paper quotes 12.2, that would mandate that 12
>>>>>must come up more often that 11.  I haven't gone thru each log in that kind
>>>>>of detail as that is a recipe for a headache.  :)
>>>>>
>>>>>
>>>>>
>>>>>>Another thing I'm not sure of is: *When* could relatively safely be claimed,
>>>>>>that DB.'s depth is reached again:
>>>>>>
>>>>>>a) when a current prog reaches at least 16 plies as a typical middlegame depth,
>>>>>>   because some search techniques used now (which DB. didn't use), make up for
>>>>>>   the missing ply (at least), or
>>>>>>b) when 17 plies are reached, not earlier, or
>>>>>>c) a program would have to reach more than 17 plies, because DB used much more
>>>>>>   knowledge which current software probably does not yet use to that extent.
>>>>>>
>>>>>>I search for expert's opinions of *when* we can say something like "Yes, now
>>>>>>with this specific performance [## plies etc.] we can safely say - as it's our
>>>>>>*best guess*, since no direct head-to-head match is possible - that this new
>>>>>>chess computer is better than Deep Blue was."
>>>>>
>>>>>I don't see any real way to do this.  IE take the following types of
>>>>>programs and try to compare depths:
>>>>>
>>>>>1.  Junior, which uses a different definition of ply than everyone else.
>>>>>They appear to search _much_ deeper than anyone else, based only on this,
>>>>>but Amir has explained how he counts plies, and the bottom line is that
>>>>>raw ply depth can't be compared.
>>>>>
>>>>>2.  Very dumb and fast program, with no q-search to speak of.  Since the
>>>>>q-search is at _least_ 50% of the total tree search space, lopping that off
>>>>>gets more depth.  But how to compare 14 plies with no q-search to 12 plies
>>>>>with q-search?
>>>>>
>>>>>3.  lots of selective search extensions.  This program might only search
>>>>>9 plies deep on average, but it extends the _right_ moves at the right times,
>>>>>so that even though it is only searching 9 plies deep, it beats the "22-ply
>>>>>searching Junior program" handily.
>>>>>
>>>>>4.  Lots of other variations.  The bottom line is that depth is not an easy
>>>>>way to compare programs.  Neither is NPS.  Unless you see some _real_ depth
>>>>>that is way beyond everyone.  Or some real NPS that is way beyond everyone.
>>>>>
>>>>>For example, we have had a couple of very fast/dumb programs compete over
>>>>>the years, and they have managed to do very well, because their speed and
>>>>>tactics overcame their lack of positional understanding, when playing the
>>>>>opponents they drew in the ACM/WCCC events.  We have also seen very slow
>>>>>programs out-play everyone.  But we are talking about programs that are
>>>>>generally within an order of magnitude of each other.  Say 20K nodes per
>>>>>second to 200K nodes per second.  If someone suddenly hits the scene going
>>>>>200M nodes per second, then that is a serious number if it is real...  So
>>>>>even though I generally say that comparing NPS is a bad idea unless you are
>>>>>using the _same_ program, there are logical exceptions...
>>>>>
>>>>>>
>>>>>>But the claim should be illustrated by somewhat convincing figures (node rate is
>>>>>>not convincing enough IMO, although still impressive). Maybe the ply depth is; I
>>>>>>know it's also no perfect comparison though. But we don't have anything better
>>>>>>probably. A few positons/moves to compare are not enough.
>>>>>
>>>>>I think you have to look at results above all else.  IE for IBM, deep thought
>>>>>totally dominated computer chess for 10 years, losing one well-known game.  That
>>>>>is tough to do if you are not far better than everyone else.  Since their last
>>>>>computer event in 1995, suddenly they started going 100X faster.  So they have
>>>>>a significant boost there, unless you do as some do and conclude that the
>>>>>extra speed means nothing.
>>>>
>>>>I conclude that it was not 100 times fasters.
>>>>
>>>>1)200M nodes is wrong based on the paper of Hsu.
>>>>2)They suffered from lack of efficiency because they prefered
>>>>to improve the evaluation and not to fix
>>>>the efficiency problems.
>>>>
>>>>I will not be surprised if their nodes were eqvivalent only
>>>>to 20M on a single PC that is also very good achievement.
>>>>
>>>>I also believe that they were better than the programs
>>>>of 1997 even if you use the hardware of today.
>>>>
>>>>Uri
>>>
>>>
>>>I don't believe they were only equivalent to 20M nodes.  Simply because I
>>>know how strong deep thought was from first-hand experience.  But I don't
>>>have access to the machine to do the same kind of testing I can do with
>>>Crafty.  I _know_ how much faster I run on my quad than I do on a single
>>>cpu.  And _anybody_ can measure that if they have a quad handy since the
>>>source for crafty is available.
>>>
>>>Unfortunately, we don't have that luxury with DB2.  But I find it very
>>>difficult to believe that it was only a 20M machine effectively...
>>>particularly considering that Hsu said more than once that he was driving
>>>the chess processors at 70% duty cycle...
>>
>>If you look in the paper their reported speedups were extrapolated.
>>So they measured what 1 cpu did and compared with a few processors,
>>then used that number for 480 processors instead of measuring 480.
>
>Vincent, this is something to do with _that_ paper.  IE it should be
>pretty obvious why they had to extrapolate at all.  All they have is
>DB Jr to work with.
>
>Hsu did _lots_ of testing on the real DB machines when he had time.  And
>he did _real_ speedup testing just like we do.  Don't confuse what was
>in _that_ paper and assume that is _all_ they did.  It wasn't...
>
>I've seen some speedup stuff for DB1 in fact.  I saw a couple of test
>positions where DB1 ran about 25 times faster with 200+ processors than
>it did with just one.  I saw a couple of others where it was more like
>50...  That isn't great, but it is _not_ "bad".  He gave me a number
>of 30% way back, which I have quoted before.  IE with 200 processors
>he said that 30% of that was a good estimate...  That was a number he
>also mentioned in his dissertation...

I do not know what he told you but I read an esimate for efficiency of 8-12% for
Deeper Blue.

>
>Most of us would _not_ be happy with 30%.  IE I am not really happy
>with my current 70%+ numbers, since Cray Blitz could do significantly
>better with four processors.  However, 30% is not a bad result when you
>go to large numbers of processors... and perhaps I might be happy with
>30% once I get to the 480 processor level, although I have not seen
>anything that said DB2 stayed at 30% since it had 2x more processors.

If we assume 30% as correct for DB1 then common sense says that it is possible
that he got less than 30% for more processors.

The number of % go down when you have more processors and the deep blue team
considered the evaluation as more important so they did not care about
increasing the %.

Note that I do not assume that the 30% is correct for DB1 because if I
understand correctly it seems to contradict the 8-12% efficiency for Deeper Blue
because doubling the number of proccesors should not reduce the efficiency by a
factor of more than 2.

Uri



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.