Author: Matthias Wuellenweber
Date: 02:28:11 01/28/00
Go up one level in this thread
If you take a whole game as one probabilistic event, the number of games needed to ascertain playing strength rankings seems depressing and Christophe's program pointedly illustrates this. The error margin goes down only with roughly 1/sqrt(N). However from practical experience this doesn't feel right, the result fluctuation seems narrower than expected from statistical distributions. As my old buddy Thorsten Czubics, an eminent critic of statistics, always used to say: "Pah, I only need to look at one game to see whether a program is good". I think there is a grain of truth in this. A computer chess game is not a single random event but a string of them. There are N crucial turning points in a game where finding "the better move" could strongly influence or even decide the outcome of the game. For each of those N crucial points the stronger program has a certain chance to succeed, the weaker program a chance to stumble. N could be quite high, not much lower than the game length in full moves. This means that one needs much less games to measure relative playing strength than expected from the "one result = one chance event" angle. A better way to undermine the overconfidence in result counting could be the disturbing influence of hardware and time controls. Hiarcs 7.32 seems to get problems against the brand new programs on fast machines at long time controls. However it always shines brilliantly in Blitz on, say, 500Mhz. Matthias Wüllenweber
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.