Author: Dan Homan
Date: 08:45:24 08/14/98
Go up one level in this thread
On August 14, 1998 at 06:08:36, Dan Homan wrote: > >The problem with small number statistics is that they can be very >mis-leading. A 4-0 result in a 4 game match between nearly equal >programs (with 20% draw chances) happens about 1/40 th of the time. >A 3.5-0.5 (or better) result happens about 1/13 th of a time. > >If program A beats program B by a score of 4-0, this means that A has >a 97% (roughly) chance of being stronger than A. So it seems like a >pretty good bet that A is better than B, but consider the following >scenario. The above paragraph is incorrect, because I didn't consider the chance that the weaker program could also go 4-0. The conclusion that A is stronger than B from a 4-0 result is considerable worse than 97% accurate. For nearly equal programs (that differ in only a few percent winning chances) the conclusion is correct more like 60-70% of the time. > >Say that you use this 4-game match technique to test new versions of >your program versus older versions. Whenever you make a change you >run one of these matches and decide to keep the change only if you get >a 4-0 result. Because you have a very well developed program, most >changes will have almost no effect on playing strength. Even changes >that do increase the playing strength slightly will not affect the >1/40 odds of getting a 4-0 result very much. So, you will get a >4-0 result 1/40 th of the time - regardless of whether the change >you make is good or bad. So using these 4-game matches to decide >on playing strength increases will cause you to randomly select >which versions to keep and which to discard. > >So a 97% confidence isn't that helpful after all - at least not for >what we chess programmer do. The problem is that we are trying to >descriminate small differences in playing strength and 4-game match >just can't do that with any reliability. > > - Dan
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.