Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: To Bob Hyatt: Ply Number Versus Rating

Author: Shane Booth

Date: 14:57:55 01/05/99

Go up one level in this thread


On January 05, 1999 at 09:49:32, Robert Hyatt wrote:

>On January 05, 1999 at 00:29:08, Shane Booth wrote:
>
>>On January 04, 1999 at 18:06:41, Robert Hyatt wrote:
>>
>>[lots snipped]
>>>I'm not sure about depth... but I'd guess that in blitz games, the middlegame
>>>depth is in the range of 9 plies, maybe 10 (of course in some wild positions it
>>>might only be 8 or so as well).
>>>
>>>As far as what does a ply do for crafty?  hard to quantify, but clearly every
>>>ply is important.  IE the p6 was about 2.5X the P5/133 I used and its
>>>rating took a big jump.  The  quad p6 gave another factor of 3 and another
>>>noticable jump, and now the quad xeon (which ramps out at about 2x the quad
>>>p6, roughly) is another noticable jump.  So far, I see *no* indication that
>>>another ply won't produce a better search result...
>>>
>>>what we need is for someone to organize a crafty vs crafty tournament, no
>>>pondering, and try one with sd=4 against sd=5,6,7,8,9,10 (at least).  Then
>>>do sd=5 vs 6,7,8,9,10, then sd=6 vs sd=7,8,9,10 and so forth.  That would
>>>give a good graph of what another ply (or more) is worth for crafty...
>>
>>I have been running a similar experiment recently, playing Crafty 15.20
>>against CM4000 using search depths of 1-10 for Crafty and 1-8 for CM4000.
>>I've only played 300 games so far (so 30 games for each search depth
>>for Crafty), so the statistical error in the results is still quite high.
>>Anyway, here's what I have so far: (Fixing the rating of 1 ply Crafty
>>arbitrarily to 600)
>>
>>Depth  Crafty  CM4000
>>1      600     655
>>2      793     1012
>>3      1072    1401
>>4      1365    1422
>>5      1575    1711
>>6      1677    1791
>>7      1697    2027
>>8      1940    2198
>>9      2076
>>10     2242
>>
>>Crafty has learning turned off, and games between the same opponents
>>are never played with the same opening.
>>
>>Regards,
>>Shane
>
>
>This test isn't really very good, because you have two variables in your
>mix, rather than the one we would normally like in a scientific experiment.
>
>IE it is quite obvious that chessmaster and crafty do a totally different sort
>of search, when you try having both search to depth=8 for example, because
>crafty will get there _way_ faster.  Which means it is doing less work (which
>probably means fewer extensions).  Perhaps a better way of doing your test would
>be time-based instead.  1 sec vs 1 sec, 1 sec vs 2 secs, and so forth.  But you
>still have two different programs. which means still two variables.
>
>and using crafty vs crafty has an additional problem, in that by my test
>results, using two identical programs means that any minor change to one will
>be greatly exaggerated in the game results.  IE giving one 2x more time will
>likely produce results that are better than expected for the side having 2x
>more time.

Sorry Robert, I believe you've completely missed the point of my test.
I am NOT attempting to compare CM4000 and Crafty in any way.  It is obvious
that CM4000 will perform better than Crafty at the same depth because
CM4000 does do far more work in the search and Crafty does of course get
to the same depth much quicker.  The whole point is to get an estimate
of what an extra ply for a program does to it's strength.

Every time people have examined the effect of depth against rating (e.g.
K. Thompson's work with Belle), they have played a program against itself.
I have always felt that this has the problem that small improvements
are going to be magnified when identical programs play (as you say in
your last paragraph).

Thus I feel that the published results have the potential to have the
rating difference between plies to be magnified by this effect, hence
I wished to try the same experiment, but having the opponent for each
game to be a different program with a completely different algorithm.

I feel I must reject your "two variables in the mix" comment.  Surely
an adequate way of testing Crafty's strength at different search depths
would be to play Crafty at different search depths against a large pool
of human players.  Since I don't have a large pool of human players at
my disposal, I have replaced this by playing Crafty against a different
program.  Although, to get significant results I need to play many
more games as currently I believe the standard deviation of my ply rating
differences to be around 120 points or so!

If you would care to expand on your rebuttal to my experiment, I would
be most interested.  If it really is not good, I can immediately stop
wasting my time.  But currently, your response does not appear as if
you have considered the problem in any depth.

I will not be able to read email for two weeks (travelling overseas),
so don't expect quick email replies!  I will be able to respond here
though.

Regards,
Shane Booth



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.