Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: A question about statistics...

Author: Mark Young

Date: 09:29:15 01/04/04

Go up one level in this thread


On January 04, 2004 at 11:46:00, Roger Brown wrote:

>Hello all,
>
>I have read numerous posts about the validity - or lack thereof actually - of
>short matches between and among chess engines.  The arguments of those who say
>that such matches are meaningless (Kurt Utzinger, Christopher Theron, Robert
>Hyatt et al)typically indicate that well over 200 games are requires to make any
>sort of statisticdal statement that engine X is better than engine Y.
>
>I concede this point.

If you concede this point you don't understand. There is no magic number like
200 or 2000. The score must be considered. Here is an example:

A score of 17 - 3 in a 20 game match has a certainty of over 99% that the winner
of the match is stronger then the loser.

A 100 game match ending 55 - 45 only has a 81% chance that the winner of the
match is the stronger program.

A 200 game match ending 106 - 94 only has a 78 % chance that the winner is
stronger then the loser.


>
>The arguments of the short match exponents typically centre on other
>chessplaying characteristics of an engine which may also be of  interest to a
>user - tactical excitement, daring, amazing moves, positional considerations,
>human like play etc.
>
>I also agree that this camp has a valid perspective.
>
>I would like  to conduct an experiment but I need to ask a few questions first:
>
>(1)  Is there a minimum timecontrol that is satistically relevant to games
>played at classical timecontrols?  That was really one of the things I wanted to
>look at but clearly it requires a pool of such games, consistent hardware, etc.
>
>I ask this because the long timecontrol devotees have spare hardware, or at
>least hardware over which they exercise an enormous amount of discretion as to
>its use.  Not all of us are in that fortunate position.
>
>Playing 200 games or more at 60 minutes + (which is still fast chess!) would
>take me to a place where the light does not shine...
>
>I am thinking that there may be a relationship - particularly as the subject is
>an electronic construct - between long games and short ones.  It may not be
>linear but I cannot believe that it is a coincidence that the long timecontrol
>GMs are also atop the blitz ratings ladder...
>
>
>(2)  What is the statistical minimum of games that I would have to play to be
>able to make some sort of definitive noise?
>
>
>(3)  What is the impact - or theoretical impact - of learning on such a match?
>My personal bias is that if an author implements learning he should be rewarded
>for it and it should be turned on at the beginning of the match.  This speaks to
>positional and book learning.
>
>
>(4)  I am also biased towards using the engine's particular book(s).  The
>opening knowledge that a human chessplayer has is his/hers.  An engine should
>have its own book with it as it goes into battle.  Can someone turn off Ms.
>Polgar's opening book?  No?  Then the engine should have its book too....
>
>
>(5)  The games would be played on my single processor CPU.  That would mean no
>pondering *if* I understand Robert Hyatt's reasoning on the matter (which I
>freely admit may not be the case at all!).
>
>
>(6)  Are there any other factors?
>
>
>
>I really would like a way to prove or disprove the position that:
>
>(1) Games at shorter timecontrols are essentially worthless and:
>
>(2) That matches of 1000 games are required to make statistical statements.
>
>
>Please feel free to comment BUT what I would really like are some answers to the
>above questions and/or pointers....
>
>
>
>Later.



This page took 0.1 seconds to execute

Last modified: Thu, 07 Jul 11 08:48:38 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.