Computer Chess Club Archives

Search

Terms
Messages

Subject: Re: Some analysis of Deep Fritz for kasparov-deeper blue first game

Author: Vincent Diepeveen
Date: 09:47:10 05/07/01
On May 07, 2001 at 09:43:33, Robert Hyatt wrote:

>On May 07, 2001 at 03:19:05, Uri Blass wrote:
>
>>On May 06, 2001 at 23:53:59, Robert Hyatt wrote:
>>
>>>On May 06, 2001 at 19:46:43, Vincent Diepeveen wrote:
>>>
>>>>On May 06, 2001 at 02:28:14, Uri Blass wrote:
>>>>
>>>>>I gave Deep Fritz to analyze similiar number of nodes to Deeper blue and Deep
>>>>>Fritz seems to be clearly better in tactics.
>>>>>
>>>>>Deep Fritz needs only 191728 knodes to see the line Rf5+ Ke3
>>>>>It means only 1 second if I asuume 200,000,000 nodes per second.
>>>>>
>>>>>I believe that Rf5+ failed low at depth 17 for Deeper blue for the reason Ke3.
>>>>>The pv of deeper blue at smaller depthes is Rf5+ Ke2
>>>>
>>>>11 ply for those who are good in math and a bit more real to the world.
>>>
>>>Uri is correct.  Unless you _still_ dispute the direct statement(s) by the
>>>Deep Blue team.
>>>
>>>
>>>
>>>>
>>>>>Deep Fritz probably does better extensions than Deeper blue because Deep Fritz
>>>>>see big fail low at depth 16.
>>>>
>>>>Fritz hardly has dangerous extensions.
>>>>
>>>>Diep has. note i am not extending passers much. Just a bit and only
>>>>now and then.
>>>>
>>>>The Big fail low comes at 12 ply for DIEP. Then it sees Rf5 is losing
>>>>because of Ke3 though it initially wants to go e3. Then i did a state
>>>>check to see what the deepest search lines are. You can see it
>>>>yourself:
>>>
>>>
>>>What does any of this matter?  Their score was bad... yours is bad, black
>>>is lost...  I don't see where you see it any faster than they did...
>>
>>I see that Deeper blue score is clearly better than the score of other programs
>>after search.
>>
>>Deeper blue said only 2.1 pawns for white after 73 seconds of search when other
>>programs has no problem to see clearly better score for white.
>
>
>You are making the same mistake _everyone_ makes.  Taking scores to be an
>absolute assessment of the truth.  IE try Vincent's scores.  And compare them
>to mine.  I have seen many games where we were different by 1-2 pawns, and
>more often than not mine has been right.  It is _easy_ to cause this.
>
>I take the more practical approach of "+ is good for white, - is good for black"
>but I don't fall into the trap of +1.7 here is much better than 1.4 by that
>program.  IE don't look for an eval to be an "absolute" evaluation of the
>position.  To do so is a _big_ mistake.
>
>
>
>
>>
>>I can explain 1 pawn difference or even 1.5 pawns difference by different
>>evaluation but the difference between Crafty's evaluation(4.22) and their
>>evaluation(2.1 after 73 seconds)  is more than 2 pawns(I mean to the evaluation
>>of Rf5+) and it can be explained only by the fact that crafty could see deeper.
>>
>
>
>
>It has nothing to do with depth most likely.  It has to do with evaluation.
>Crafty is asymmetric.  They were not.  That most likely is _the_ reason for
>the difference.
>
>
>
>
>>Their score at depth 15 is only 1.63 for white so if you compare same depth then
>>it is clear that Crafty did better extensions than deeper blue.
>
>You are diagnosing the disease without _ever_ seeing the patient.  _any_ doctor
>will tell you that is an unforgivable sin that leads to dire consequences.
>
>
>
>
>>
>>If you do not like depth 15 of move 43 because of the bug that cause deeper blue
>>to play Rd1 you can take depth 11(6)=17 at move 42 abd you find there a score of
>>only 1.36 pawns for white.
>>
>>I assume that 11(6) means depth 17 with futility pruning and in this case the
>>top programs of today clearly do better extensions than deeper blue.
>>
>>Uri
>
>
>Based on what?  You can _not_ see at _least_ the last 1/3 of their PV, or in
>the case of 11(6) the last 6 plies + the q-search.. so all you can use to
>make a conclusion is the absolute value of their score.  Cray Blitz was _far_
>more conservative in scoring than Crafty is.  It would be quite common for
>Crafty to say +3 and Cray Blitz to say +1, with the _exact_ same PV.  I don't
>see how you can conclude _anything_ with no data...
>
>I don't try to understand what they are seeing there since I (a) don't know what
>their positional eval terms are;  (b) I can't see about 1/2 of their PV in most
>cases;  (c) due to (b) I am stuck with (a).  In short, trying to deduce some-
>thing from their output is nearly impossible.

Uri is a strong chessplayer. Very good analysis always.
Equipped with a few reports over the games and some analysis you can
easily conclude things based upon the lines a program produces.

Seeing the last 6 plies of the PV is usually not relevant for that.

Most bugs in diep i FIX because i see something weird in my pv. i go
debug and i find something wrong and then i have something to fix!

For example if in openings position a mainline says
  1.a3,a5

then i am very worried...

So if some machine finds the right move at 8(6) then we do have
data to compare with nowadays programs.

If a machine gets a big fail low at 11(6) (but no score)
then we do have data. And yes i'm very sure that Deep Blue saw Ke3
there just like DIEP sees there too after Rf5...

I do not doubt that when searching at the same depth, that fullwidth
+ loads of extensions see tactical more as i do.

I do not believe that 11(6) is 17 ply fullwidth + singular extensions +
recapture extensions + some other extensions.

Because if i find all those things too at 11 or 12 ply with
probably more limited extensions then they are pruning last 6 ply
or... ...they just searched 11 ply.

The last is very relevant. Everyone who does next experiment will
find that they searched between 11 and 13 ply and nothing more as
minimum search depth.

The experiment:
  add singular extensions (very important as the overhead is huge
  fullwidth).
  get rid of PVS and replace it by normal alfabeta (see what Hsu writes
  in ieee99)
  add recapture extensions
  do not store and retrieve positions in hashtable last 6 plies.
  Do not do the above dangerous extensions last 6 ply, of course do
  everywhere check extensions. Extend this ALWAYS.

Now please compare node counts. If there are volunteers to run a DIEP
version with these settings be my guest.

Only need 200million x 180 seconds = say 36 billion nodes.

Dual single cpu or whatever doesn't matter.

I don't get much above 10 ply...

All arguments  here in CCC are a joke as long as people don't repeat
this experiment which i did. Note i did use PVS in my experiments and
not normal alfabeta, so i give deep blue that speedup for free :)

Getting 10 ply is a billion nodes...

Diep gets at a K7 at 1Ghz about 56k nodes a second. If it needs
a few billion nodes to get to 11 ply i assume everyone believes that
it needs more to get deeper. So running each position for a few
hours is no problem. Then of course the second run is the same positions
with hashtable and with default version.

Then compare the both outputs. Especially compare the first output with
DB's output.

That's real fun... ...but it takes day sof time. I did it for 2
positions so far.

Then i knew forever that nullmove is the best invention in computerchess
when invented by wasn't it Don Beal in 1977?

The overhead of dangerous extensions and recapture and check extensions
when they get done at huge depths is something all people here underestimate.

Because you DO NOT PRUNE any line. So it only gets longer and longer.

It tactical kicks butt of course if you can get to say 12 ply (which
DB got of course). That Hsu gets to 12 ply anyway fullwidth is
very impressive, as i would probably need more nodes when not being able
to use hashtable either local or global!

Best regards,
Vincent
Re: Some analysis of Deep Fritz for kasparov-deeper blue first game Robert Hyatt 11:20:42 05/07/01
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.