Author: jonathon smith
Date: 10:01:10 01/28/00
Go up one level in this thread
ChrisW (who is unable to post here because he is banned for life) has answered this thought provoking post by Matthias on http://www.oxford-softworks.com/fcchess.html On January 28, 2000 at 05:28:11, Matthias Wuellenweber wrote: >If you take a whole game as one probabilistic event, the number of games needed >to ascertain playing strength rankings seems depressing and Christophe's program >pointedly illustrates this. The error margin goes down only with roughly >1/sqrt(N). > >However from practical experience this doesn't feel right, the result >fluctuation seems narrower than expected from statistical distributions. > >As my old buddy Thorsten Czubics, an eminent critic of statistics, always used >to say: "Pah, I only need to look at one game to see whether a program is good". >I think there is a grain of truth in this. > >A computer chess game is not a single random event but a string of them. There >are N crucial turning points in a game where finding "the better move" could >strongly influence or even decide the outcome of the game. For each of those N >crucial points the stronger program has a certain chance to succeed, the weaker >program a chance to stumble. > >N could be quite high, not much lower than the game length in full moves. > >This means that one needs much less games to measure relative playing strength >than expected from the "one result = one chance event" angle. > >A better way to undermine the overconfidence in result counting could be the >disturbing influence of hardware and time controls. Hiarcs 7.32 seems to get >problems against the brand new programs on fast machines at long time controls. >However it always shines brilliantly in Blitz on, say, 500Mhz. > >Matthias Wüllenweber
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.