Author: Bruce Moreland
Date: 17:04:11 08/14/98
Go up one level in this thread
On August 14, 1998 at 14:23:28, Peter Fendrich wrote: >I don't understand how you guys get these results. In my world probability is >completely different from confidence. If the probability is odds, chance or >whatever, the confidence is a meassurement of how sure we can be of the >probability value itself. The information given by a 200-game match gives far >more confident probabilities than a 4-game match. In the 4-game match it's not >even applicable to use the term confidence. It's as worthless to compute like >meassuring 1/100 of a seconds with you wrist-watch. In my world I don't know what I am talking about. I am trying to get through this dense statistics book, but it is taking a while. My reason for posting on this topic is that people seem to think that they can do an N-game match, with some suitably comforting value of N, and take the results as signficant, which in this case means, I guess, truthful, regardless of the score. I suspect that in matches that are fairly close, which most of them will be (I think), that you will end up having, for lack of a better term (yet), a range of not incredibly unlikely error which exceeds the Elo delta that can be computed from the score of the match. I think that most close matches are likely to produce an inconclusive result, rather than a hard-fought and exhausting match where "the best program won". I think the amount by which you might be mistaken would decrease if you ran more trials, but the score of a match between two approximately equal programs would tend to tighten up, as well. You might be better able to determine that "A and B aren't too much different", but it still could be a stretch to say "A is better than B". I don't have any problem with the matches themselves, only with the conclusions. A 4-0 blowout *should* be a rare thing, and even though the error margin is large, it is still a massive blowout. It might be interesting to find out how often it happens between roughly equal programs, it should happen just a few percent of the time, depending upon draw percentage (less draws means it should happen more often). I would love to hear from anyone who is competent in this area, who could tell me with authority where I am messing up. I freely admit I might be wrong, and I've heard from several people who think that 4-0 is pretty common and means nothing, but I really would like to figure out *why*, since this should be rare between equal programs. bruce
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.