Author: Dan Homan
Date: 03:08:36 08/14/98
Go up one level in this thread
On August 12, 1998 at 09:15:19, Bruce Moreland wrote: > >On August 11, 1998 at 06:52:02, Tony Hedlund wrote: > >> >>>So I think 4-0 actually turns out to be a significant result. If you score 4-0, >>>you can say that there is a very good chance that the one with the wins is >>>better than the ones with the losses. >>> >>>You can't say this if you pick out a string of 4 wins in a row in the midst of a >>>longer match, since you might be selecting a fluke case, but if you just start >>>from scratch, and get 4-0, you should be able to stop. In fact I think you >>>might be able to stop if you get 3.5 - 0.5, but I am less certain of this case. >>>Someone who has more statistics than I may be willing to comment on this. >> >>Recently I played the match Shredder2 P200 MMX 64MB - Rebel8 P90 16MB. >>Rebel won the first four games but Shredder won the match with 11-9. > >That shouldn't happen very often. > >bruce The problem with small number statistics is that they can be very mis-leading. A 4-0 result in a 4 game match between nearly equal programs (with 20% draw chances) happens about 1/40 th of the time. A 3.5-0.5 (or better) result happens about 1/13 th of a time. If program A beats program B by a score of 4-0, this means that A has a 97% (roughly) chance of being stronger than A. So it seems like a pretty good bet that A is better than B, but consider the following scenario. Say that you use this 4-game match technique to test new versions of your program versus older versions. Whenever you make a change you run one of these matches and decide to keep the change only if you get a 4-0 result. Because you have a very well developed program, most changes will have almost no effect on playing strength. Even changes that do increase the playing strength slightly will not affect the 1/40 odds of getting a 4-0 result very much. So, you will get a 4-0 result 1/40 th of the time - regardless of whether the change you make is good or bad. So using these 4-game matches to decide on playing strength increases will cause you to randomly select which versions to keep and which to discard. So a 97% confidence isn't that helpful after all - at least not for what we chess programmer do. The problem is that we are trying to descriminate small differences in playing strength and 4-game match just can't do that with any reliability. - Dan
This page took 0.01 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.