Author: Frank Schubert
Date: 08:52:35 01/28/00
Go up one level in this thread
On January 28, 2000 at 05:28:11, Matthias Wuellenweber wrote: >If you take a whole game as one probabilistic event, the number of games needed >to ascertain playing strength rankings seems depressing and Christophe's program >pointedly illustrates this. The error margin goes down only with roughly >1/sqrt(N). > >However from practical experience this doesn't feel right, the result >fluctuation seems narrower than expected from statistical distributions. > >As my old buddy Thorsten Czubics, an eminent critic of statistics, always used >to say: "Pah, I only need to look at one game to see whether a program is good". >I think there is a grain of truth in this. > >A computer chess game is not a single random event but a string of them. There >are N crucial turning points in a game where finding "the better move" could >strongly influence or even decide the outcome of the game. For each of those N >crucial points the stronger program has a certain chance to succeed, the weaker >program a chance to stumble. > >N could be quite high, not much lower than the game length in full moves. > >This means that one needs much less games to measure relative playing strength >than expected from the "one result = one chance event" angle. > Yes, you are abolutely right. For each of those N crucial points the stronger program has a higher chance to succeed and therefore the probability to win the match is also higher. But if you want to measure playing strength by Elo points, a computer chess game is still a single random event because you only count the result of the whole game, not the result of each crucial point. But of course the probability that this event is a win or loss for one program is of course not 50 % in general. So the calculations from above cannot produce the correct results. Bye Frank
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.