Computer Chess Club Archives

Search

Terms
Messages

Subject: Re: To Bob Hyatt: Ply Number Versus Rating

Author: Robert Hyatt
Date: 20:14:42 01/05/99
On January 05, 1999 at 17:57:55, Shane Booth wrote:

>On January 05, 1999 at 09:49:32, Robert Hyatt wrote:
>
>>On January 05, 1999 at 00:29:08, Shane Booth wrote:
>>
>>>On January 04, 1999 at 18:06:41, Robert Hyatt wrote:
>>>
>>>[lots snipped]
>>>>I'm not sure about depth... but I'd guess that in blitz games, the middlegame
>>>>depth is in the range of 9 plies, maybe 10 (of course in some wild positions it
>>>>might only be 8 or so as well).
>>>>
>>>>As far as what does a ply do for crafty?  hard to quantify, but clearly every
>>>>ply is important.  IE the p6 was about 2.5X the P5/133 I used and its
>>>>rating took a big jump.  The  quad p6 gave another factor of 3 and another
>>>>noticable jump, and now the quad xeon (which ramps out at about 2x the quad
>>>>p6, roughly) is another noticable jump.  So far, I see *no* indication that
>>>>another ply won't produce a better search result...
>>>>
>>>>what we need is for someone to organize a crafty vs crafty tournament, no
>>>>pondering, and try one with sd=4 against sd=5,6,7,8,9,10 (at least).  Then
>>>>do sd=5 vs 6,7,8,9,10, then sd=6 vs sd=7,8,9,10 and so forth.  That would
>>>>give a good graph of what another ply (or more) is worth for crafty...
>>>
>>>I have been running a similar experiment recently, playing Crafty 15.20
>>>against CM4000 using search depths of 1-10 for Crafty and 1-8 for CM4000.
>>>I've only played 300 games so far (so 30 games for each search depth
>>>for Crafty), so the statistical error in the results is still quite high.
>>>Anyway, here's what I have so far: (Fixing the rating of 1 ply Crafty
>>>arbitrarily to 600)
>>>
>>>Depth  Crafty  CM4000
>>>1      600     655
>>>2      793     1012
>>>3      1072    1401
>>>4      1365    1422
>>>5      1575    1711
>>>6      1677    1791
>>>7      1697    2027
>>>8      1940    2198
>>>9      2076
>>>10     2242
>>>
>>>Crafty has learning turned off, and games between the same opponents
>>>are never played with the same opening.
>>>
>>>Regards,
>>>Shane
>>
>>
>>This test isn't really very good, because you have two variables in your
>>mix, rather than the one we would normally like in a scientific experiment.
>>
>>IE it is quite obvious that chessmaster and crafty do a totally different sort
>>of search, when you try having both search to depth=8 for example, because
>>crafty will get there _way_ faster.  Which means it is doing less work (which
>>probably means fewer extensions).  Perhaps a better way of doing your test would
>>be time-based instead.  1 sec vs 1 sec, 1 sec vs 2 secs, and so forth.  But you
>>still have two different programs. which means still two variables.
>>
>>and using crafty vs crafty has an additional problem, in that by my test
>>results, using two identical programs means that any minor change to one will
>>be greatly exaggerated in the game results.  IE giving one 2x more time will
>>likely produce results that are better than expected for the side having 2x
>>more time.
>
>Sorry Robert, I believe you've completely missed the point of my test.
>I am NOT attempting to compare CM4000 and Crafty in any way.  It is obvious
>that CM4000 will perform better than Crafty at the same depth because
>CM4000 does do far more work in the search and Crafty does of course get
>to the same depth much quicker.  The whole point is to get an estimate
>of what an extra ply for a program does to it's strength.
>
>Every time people have examined the effect of depth against rating (e.g.
>K. Thompson's work with Belle), they have played a program against itself.
>I have always felt that this has the problem that small improvements
>are going to be magnified when identical programs play (as you say in
>your last paragraph).
>
>Thus I feel that the published results have the potential to have the
>rating difference between plies to be magnified by this effect, hence
>I wished to try the same experiment, but having the opponent for each
>game to be a different program with a completely different algorithm.
>
>I feel I must reject your "two variables in the mix" comment.  Surely
>an adequate way of testing Crafty's strength at different search depths
>would be to play Crafty at different search depths against a large pool
>of human players.  Since I don't have a large pool of human players at
>my disposal, I have replaced this by playing Crafty against a different
>program.  Although, to get significant results I need to play many
>more games as currently I believe the standard deviation of my ply rating
>differences to be around 120 points or so!
>
>If you would care to expand on your rebuttal to my experiment, I would
>be most interested.  If it really is not good, I can immediately stop
>wasting my time.  But currently, your response does not appear as if
>you have considered the problem in any depth.
>
>I will not be able to read email for two weeks (travelling overseas),
>so don't expect quick email replies!  I will be able to respond here
>though.
>
>Regards,
>Shane Booth


The problem I see is that it is possible that crafty is flawed in some basic
way that has been undetected.  IE a hashing bug, a search bug, or an evaluation
bug.  Playing it against a tested program might well produce distorted results
because of this.

That was my only point.  When you have two different programs, and you are
tweaking them by changing the depth, you have two degrees of freedom in the
result.  For high correlation, you want only one, the axis you are most
interested in (in this case, depth).

You results aren't necessarily invalid...  nor are they necessarily a positive
answer to the question posed...  it's _not_ an easy question to answer...
Why is this a hard question to answer? KarinsDad 07:26:20 01/06/99
- Re: Why is this a hard question to answer? Robert Hyatt 13:53:08 01/06/99
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.