Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Dummy Cadaques Tournament (Long)

Author: Matthias Wuellenweber

Date: 02:28:11 01/28/00

Go up one level in this thread


If you take a whole game as one probabilistic event, the number of games needed
to ascertain playing strength rankings seems depressing and Christophe's program
pointedly illustrates this. The error margin goes down only with roughly
1/sqrt(N).

However from practical experience this doesn't feel right, the result
fluctuation seems narrower than expected from statistical distributions.

As my old buddy Thorsten Czubics, an eminent critic of statistics, always used
to say: "Pah, I only need to look at one game to see whether a program is good".
I think there is a grain of truth in this.

A computer chess game is not a single random event but a string of them. There
are N crucial turning points in a game where finding "the better move" could
strongly influence or even decide the outcome of the game. For each of those N
crucial points the stronger program has a certain chance to succeed, the weaker
program a chance to stumble.

N could be quite high, not much lower than the game length in full moves.

This means that one needs much less games to measure relative playing strength
than expected from the "one result = one chance event" angle.

A better way to undermine the overconfidence in result counting could be the
disturbing influence of hardware and time controls. Hiarcs 7.32 seems to get
problems against the brand new programs on fast machines at long time controls.
However it always shines brilliantly in Blitz on, say, 500Mhz.

Matthias Wüllenweber



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.