Author: Shane Booth
Date: 14:57:55 01/05/99
Go up one level in this thread
On January 05, 1999 at 09:49:32, Robert Hyatt wrote: >On January 05, 1999 at 00:29:08, Shane Booth wrote: > >>On January 04, 1999 at 18:06:41, Robert Hyatt wrote: >> >>[lots snipped] >>>I'm not sure about depth... but I'd guess that in blitz games, the middlegame >>>depth is in the range of 9 plies, maybe 10 (of course in some wild positions it >>>might only be 8 or so as well). >>> >>>As far as what does a ply do for crafty? hard to quantify, but clearly every >>>ply is important. IE the p6 was about 2.5X the P5/133 I used and its >>>rating took a big jump. The quad p6 gave another factor of 3 and another >>>noticable jump, and now the quad xeon (which ramps out at about 2x the quad >>>p6, roughly) is another noticable jump. So far, I see *no* indication that >>>another ply won't produce a better search result... >>> >>>what we need is for someone to organize a crafty vs crafty tournament, no >>>pondering, and try one with sd=4 against sd=5,6,7,8,9,10 (at least). Then >>>do sd=5 vs 6,7,8,9,10, then sd=6 vs sd=7,8,9,10 and so forth. That would >>>give a good graph of what another ply (or more) is worth for crafty... >> >>I have been running a similar experiment recently, playing Crafty 15.20 >>against CM4000 using search depths of 1-10 for Crafty and 1-8 for CM4000. >>I've only played 300 games so far (so 30 games for each search depth >>for Crafty), so the statistical error in the results is still quite high. >>Anyway, here's what I have so far: (Fixing the rating of 1 ply Crafty >>arbitrarily to 600) >> >>Depth Crafty CM4000 >>1 600 655 >>2 793 1012 >>3 1072 1401 >>4 1365 1422 >>5 1575 1711 >>6 1677 1791 >>7 1697 2027 >>8 1940 2198 >>9 2076 >>10 2242 >> >>Crafty has learning turned off, and games between the same opponents >>are never played with the same opening. >> >>Regards, >>Shane > > >This test isn't really very good, because you have two variables in your >mix, rather than the one we would normally like in a scientific experiment. > >IE it is quite obvious that chessmaster and crafty do a totally different sort >of search, when you try having both search to depth=8 for example, because >crafty will get there _way_ faster. Which means it is doing less work (which >probably means fewer extensions). Perhaps a better way of doing your test would >be time-based instead. 1 sec vs 1 sec, 1 sec vs 2 secs, and so forth. But you >still have two different programs. which means still two variables. > >and using crafty vs crafty has an additional problem, in that by my test >results, using two identical programs means that any minor change to one will >be greatly exaggerated in the game results. IE giving one 2x more time will >likely produce results that are better than expected for the side having 2x >more time. Sorry Robert, I believe you've completely missed the point of my test. I am NOT attempting to compare CM4000 and Crafty in any way. It is obvious that CM4000 will perform better than Crafty at the same depth because CM4000 does do far more work in the search and Crafty does of course get to the same depth much quicker. The whole point is to get an estimate of what an extra ply for a program does to it's strength. Every time people have examined the effect of depth against rating (e.g. K. Thompson's work with Belle), they have played a program against itself. I have always felt that this has the problem that small improvements are going to be magnified when identical programs play (as you say in your last paragraph). Thus I feel that the published results have the potential to have the rating difference between plies to be magnified by this effect, hence I wished to try the same experiment, but having the opponent for each game to be a different program with a completely different algorithm. I feel I must reject your "two variables in the mix" comment. Surely an adequate way of testing Crafty's strength at different search depths would be to play Crafty at different search depths against a large pool of human players. Since I don't have a large pool of human players at my disposal, I have replaced this by playing Crafty against a different program. Although, to get significant results I need to play many more games as currently I believe the standard deviation of my ply rating differences to be around 120 points or so! If you would care to expand on your rebuttal to my experiment, I would be most interested. If it really is not good, I can immediately stop wasting my time. But currently, your response does not appear as if you have considered the problem in any depth. I will not be able to read email for two weeks (travelling overseas), so don't expect quick email replies! I will be able to respond here though. Regards, Shane Booth
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.