Computer Chess Club Archives

Search

Terms
Messages

Subject: Re: Some analysis of Deep Fritz for kasparov-deeper blue first game

Author: Robert Hyatt
Date: 11:20:42 05/07/01
On May 07, 2001 at 12:47:10, Vincent Diepeveen wrote:

>On May 07, 2001 at 09:43:33, Robert Hyatt wrote:
>
>>On May 07, 2001 at 03:19:05, Uri Blass wrote:
>>
>>>On May 06, 2001 at 23:53:59, Robert Hyatt wrote:
>>>
>>>>On May 06, 2001 at 19:46:43, Vincent Diepeveen wrote:
>>>>
>>>>>On May 06, 2001 at 02:28:14, Uri Blass wrote:
>>>>>
>>>>>>I gave Deep Fritz to analyze similiar number of nodes to Deeper blue and Deep
>>>>>>Fritz seems to be clearly better in tactics.
>>>>>>
>>>>>>Deep Fritz needs only 191728 knodes to see the line Rf5+ Ke3
>>>>>>It means only 1 second if I asuume 200,000,000 nodes per second.
>>>>>>
>>>>>>I believe that Rf5+ failed low at depth 17 for Deeper blue for the reason Ke3.
>>>>>>The pv of deeper blue at smaller depthes is Rf5+ Ke2
>>>>>
>>>>>11 ply for those who are good in math and a bit more real to the world.
>>>>
>>>>Uri is correct.  Unless you _still_ dispute the direct statement(s) by the
>>>>Deep Blue team.
>>>>
>>>>
>>>>
>>>>>
>>>>>>Deep Fritz probably does better extensions than Deeper blue because Deep Fritz
>>>>>>see big fail low at depth 16.
>>>>>
>>>>>Fritz hardly has dangerous extensions.
>>>>>
>>>>>Diep has. note i am not extending passers much. Just a bit and only
>>>>>now and then.
>>>>>
>>>>>The Big fail low comes at 12 ply for DIEP. Then it sees Rf5 is losing
>>>>>because of Ke3 though it initially wants to go e3. Then i did a state
>>>>>check to see what the deepest search lines are. You can see it
>>>>>yourself:
>>>>
>>>>
>>>>What does any of this matter?  Their score was bad... yours is bad, black
>>>>is lost...  I don't see where you see it any faster than they did...
>>>
>>>I see that Deeper blue score is clearly better than the score of other programs
>>>after search.
>>>
>>>Deeper blue said only 2.1 pawns for white after 73 seconds of search when other
>>>programs has no problem to see clearly better score for white.
>>
>>
>>You are making the same mistake _everyone_ makes.  Taking scores to be an
>>absolute assessment of the truth.  IE try Vincent's scores.  And compare them
>>to mine.  I have seen many games where we were different by 1-2 pawns, and
>>more often than not mine has been right.  It is _easy_ to cause this.
>>
>>I take the more practical approach of "+ is good for white, - is good for black"
>>but I don't fall into the trap of +1.7 here is much better than 1.4 by that
>>program.  IE don't look for an eval to be an "absolute" evaluation of the
>>position.  To do so is a _big_ mistake.
>>
>>
>>
>>
>>>
>>>I can explain 1 pawn difference or even 1.5 pawns difference by different
>>>evaluation but the difference between Crafty's evaluation(4.22) and their
>>>evaluation(2.1 after 73 seconds)  is more than 2 pawns(I mean to the evaluation
>>>of Rf5+) and it can be explained only by the fact that crafty could see deeper.
>>>
>>
>>
>>
>>It has nothing to do with depth most likely.  It has to do with evaluation.
>>Crafty is asymmetric.  They were not.  That most likely is _the_ reason for
>>the difference.
>>
>>
>>
>>
>>>Their score at depth 15 is only 1.63 for white so if you compare same depth then
>>>it is clear that Crafty did better extensions than deeper blue.
>>
>>You are diagnosing the disease without _ever_ seeing the patient.  _any_ doctor
>>will tell you that is an unforgivable sin that leads to dire consequences.
>>
>>
>>
>>
>>>
>>>If you do not like depth 15 of move 43 because of the bug that cause deeper blue
>>>to play Rd1 you can take depth 11(6)=17 at move 42 abd you find there a score of
>>>only 1.36 pawns for white.
>>>
>>>I assume that 11(6) means depth 17 with futility pruning and in this case the
>>>top programs of today clearly do better extensions than deeper blue.
>>>
>>>Uri
>>
>>
>>Based on what?  You can _not_ see at _least_ the last 1/3 of their PV, or in
>>the case of 11(6) the last 6 plies + the q-search.. so all you can use to
>>make a conclusion is the absolute value of their score.  Cray Blitz was _far_
>>more conservative in scoring than Crafty is.  It would be quite common for
>>Crafty to say +3 and Cray Blitz to say +1, with the _exact_ same PV.  I don't
>>see how you can conclude _anything_ with no data...
>>
>>I don't try to understand what they are seeing there since I (a) don't know what
>>their positional eval terms are;  (b) I can't see about 1/2 of their PV in most
>>cases;  (c) due to (b) I am stuck with (a).  In short, trying to deduce some-
>>thing from their output is nearly impossible.
>
>Uri is a strong chessplayer. Very good analysis always.
>Equipped with a few reports over the games and some analysis you can
>easily conclude things based upon the lines a program produces.

Vincent, _that_ is the big difference between you and myself.  I am a scientist.
I don't report bogus stuff.  IE I know that it is _impossible_ to produce a
speedup of > 2 on a 2 cpu machine, except for an occasional rare exception
position.  For the _longest_ time you were claiming to do so.  But as you
fixed bugs, the claims stopped.

Here, _nobody_ can conclude anything since the "lines the program produces"
are incomplete due to design.  How can you _know_ what it is or is not seeing,
if it can't tell you?  And to make conclusions on what you can't see is not
just ridiculous... it is _bad_ science.




>
>Seeing the last 6 plies of the PV is usually not relevant for that.

It is _highly_ relevant.  And in fact, as I said, it is often not just the
last 6 plies that are missing, it is much more...




>
>Most bugs in diep i FIX because i see something weird in my pv. i go
>debug and i find something wrong and then i have something to fix!
>
>For example if in openings position a mainline says
>  1.a3,a5
>
>then i am very worried...
>
>So if some machine finds the right move at 8(6) then we do have
>data to compare with nowadays programs.

Yes, for clearly forced tactical moves.  But _none_ of the moves being
considered are clearly forced tactical lines.  They are positional.  And
to argue right move or wrong move is futile there.



>
>If a machine gets a big fail low at 11(6) (but no score)
>then we do have data. And yes i'm very sure that Deep Blue saw Ke3
>there just like DIEP sees there too after Rf5...

Again, I say poppycock.  Crafty failed low at depth 15.  It was not a
"big fail low" just a third of a pawn or so.  That is a _positional_ fail
low, not a tactical fail low.  Which mean comparing two programs is futile
since their evaluations are necessarily different.




>
>I do not doubt that when searching at the same depth, that fullwidth
>+ loads of extensions see tactical more as i do.
>
>I do not believe that 11(6) is 17 ply fullwidth + singular extensions +
>recapture extensions + some other extensions.

I don't personally care what you believe.  I believe the people that wrote
the code.  You can choose to listen to them or stick your head in the sand and
say "can not" over and over and over.  But it doesn't make it true.




>
>Because if i find all those things too at 11 or 12 ply with
>probably more limited extensions then they are pruning last 6 ply
>or... ...they just searched 11 ply.
>
>The last is very relevant. Everyone who does next experiment will
>find that they searched between 11 and 13 ply and nothing more as
>minimum search depth.

That is pure garbage...  show a tactical line rather than these positional
cases.  I sat beside them many times.  Deep Thought was doing 11 plies back
in 1988 at the WCCC event that year.  That was 11 plies including 4 plies in
hardware, or 7(4) as it is now called...  Cray Blitz was doing 9-10 plies
with null-move R=1, non-recursive.  at 200K-500K nodes per second...




>
>The experiment:
>  add singular extensions (very important as the overhead is huge
>  fullwidth).

Define "huge".  Hsu and Campbell very carefully defined the overhead in
mathematical terms, and it is not "huge".  It was (for Cray Blitz) roughly
one ply.



>  get rid of PVS and replace it by normal alfabeta (see what Hsu writes
>  in ieee99)
>  add recapture extensions
>  do not store and retrieve positions in hashtable last 6 plies.
>  Do not do the above dangerous extensions last 6 ply, of course do
>  everywhere check extensions. Extend this ALWAYS.
>
>Now please compare node counts. If there are volunteers to run a DIEP
>version with these settings be my guest.
>
>Only need 200million x 180 seconds = say 36 billion nodes.
>
>Dual single cpu or whatever doesn't matter.
>
>I don't get much above 10 ply...

So your program becomes the bellweather for all of computer chess?  If I turn
off null move, I lose 2 plies roughly.  _I_ can get to 10 plies full-width
without null move _today_.  _easily_.  I can get to 14 plies with null-move on,
_today_.  With faster hardware I can get to 16-17 plies _today_.





>
>All arguments  here in CCC are a joke as long as people don't repeat
>this experiment which i did. Note i did use PVS in my experiments and
>not normal alfabeta, so i give deep blue that speedup for free :)
>
>Getting 10 ply is a billion nodes...

No it isn't.  I have run crafty with "selective 0 0" many times when
debugging.  I can get to 10 plies with no trouble whatsoever... and it
doesn't take me 20+ minutes and one billion nodes to do so...

Here is a test middlegame position searched to depth=10,
on a quad 550mhz xeon box:  With null-move totally disabled
the branching factor is worse.  But it doesn't take me a
billion nodes to search to 10 pies:


         (3)    6->   0.96  -0.66   1. ... Bxe4 2. Bxe4 Qxc4 3. Qxc4 Rxc4
                                    4. Nxb6 Rxe4 5. Nxd7 Nxd7
         (2)    7     1.56  -0.66   1. ... Bxe4 2. Bxe4 Qxc4 3. Qxc4 Rxc4
                                    4. Nxb6 Rxe4 5. Nxd7 Nxd7
                7->   2.66  -0.66   1. ... Bxe4 2. Bxe4 Qxc4 3. Qxc4 Rxc4
                                    4. Nxb6 Rxe4 5. Nxd7 Nxd7
                8     4.80  -0.59   1. ... Bxe4 2. Bxe4 Qxc4 3. Qxc4 Rxc4
                                    4. Nxb6 Nxb6 5. Bd3 Rc5
                8->   9.55  -0.59   1. ... Bxe4 2. Bxe4 Qxc4 3. Qxc4 Rxc4
                                    4. Nxb6 Nxb6 5. Bd3 Rc5
                9    20.65  -0.69   1. ... Bxe4 2. Bxe4 Qxc4 3. Qxc4 Rxc4
                                    4. Nxb6 Rxe4 5. Nxd7 Nxd7 6. Bc3 a4
                9->  44.95  -0.69   1. ... Bxe4 2. Bxe4 Qxc4 3. Qxc4 Rxc4
                                    4. Nxb6 Rxe4 5. Nxd7 Nxd7 6. Bc3 a4
               10     1:30  -0.52   1. ... Bxe4 2. Bxe4 Qxc4 3. Qxc4 Rxc4
                                    4. Nxb6 Nxb6 5. Bd3 Rc7 6. Bd4 Nfd5
               10->   3:52  -0.52   1. ... Bxe4 2. Bxe4 Qxc4 3. Qxc4 Rxc4
                                    4. Nxb6 Nxb6 5. Bd3 Rc7 6. Bd4 Nfd5
              time=3:52  cpu=399%  mat=0  n=188682918  fh=91%  nps=811k


Took me 188 million nodes to reach 10 plies.  Here is the same position to
the same depth with null-move fully operational:


                7->   0.46  -0.66   1. ... Bxe4 2. Bxe4 Qxc4 3. Qxc4 Rxc4
                                    4. Nxb6 Rxe4 5. Nxd7 Nxd7
                8     0.59  -0.59   1. ... Bxe4 2. Bxe4 Qxc4 3. Qxc4 Rxc4
                                    4. Nxb6 Nxb6 5. Bd3 Rc5
                8->   0.67  -0.59   1. ... Bxe4 2. Bxe4 Qxc4 3. Qxc4 Rxc4
                                    4. Nxb6 Nxb6 5. Bd3 Rc5
                9     1.12  -0.69   1. ... Bxe4 2. Bxe4 Qxc4 3. Qxc4 Rxc4
                                    4. Nxb6 Rxe4 5. Nxd7 Nxd7 6. Bc3 a4
                9->   1.21  -0.69   1. ... Bxe4 2. Bxe4 Qxc4 3. Qxc4 Rxc4
                                    4. Nxb6 Rxe4 5. Nxd7 Nxd7 6. Bc3 a4
               10     1.77  -0.52   1. ... Bxe4 2. Bxe4 Qxc4 3. Qxc4 Rxc4
                                    4. Nxb6 Nxb6 5. Bd3 Rc7 6. Bd4 Nfd5
               10->   2.42  -0.52   1. ... Bxe4 2. Bxe4 Qxc4 3. Qxc4 Rxc4
                                    4. Nxb6 Nxb6 5. Bd3 Rc7 6. Bd4 Nfd5


So you _can_ search to depth 10 with no null move at all.  Belle searched to
depth 8 and 9 with no null-move at 160K nodes per second.  You need to start
saying "my program can't do this"... not "no program can do this..."


>
>Diep gets at a K7 at 1Ghz about 56k nodes a second. If it needs
>a few billion nodes to get to 11 ply i assume everyone believes that
>it needs more to get deeper. So running each position for a few
>hours is no problem. Then of course the second run is the same positions
>with hashtable and with default version.
>
>Then compare the both outputs. Especially compare the first output with
>DB's output.
>
>That's real fun... ...but it takes day sof time. I did it for 2
>positions so far.
>
>Then i knew forever that nullmove is the best invention in computerchess
>when invented by wasn't it Don Beal in 1977?
>
>The overhead of dangerous extensions and recapture and check extensions
>when they get done at huge depths is something all people here underestimate.
>
>Because you DO NOT PRUNE any line. So it only gets longer and longer.
>
>It tactical kicks butt of course if you can get to say 12 ply (which
>DB got of course). That Hsu gets to 12 ply anyway fullwidth is
>very impressive, as i would probably need more nodes when not being able
>to use hashtable either local or global!
>
>Best regards,
>Vincent
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.