Computer Chess Club Archives




Subject: Re: A question about statistics...

Author: Robert Hyatt

Date: 09:07:19 01/04/04

Go up one level in this thread

On January 04, 2004 at 11:46:00, Roger Brown wrote:

>Hello all,
>I have read numerous posts about the validity - or lack thereof actually - of
>short matches between and among chess engines.  The arguments of those who say
>that such matches are meaningless (Kurt Utzinger, Christopher Theron, Robert
>Hyatt et al)typically indicate that well over 200 games are requires to make any
>sort of statisticdal statement that engine X is better than engine Y.
>I concede this point.
>The arguments of the short match exponents typically centre on other
>chessplaying characteristics of an engine which may also be of  interest to a
>user - tactical excitement, daring, amazing moves, positional considerations,
>human like play etc.
>I also agree that this camp has a valid perspective.
>I would like  to conduct an experiment but I need to ask a few questions first:
>(1)  Is there a minimum timecontrol that is satistically relevant to games
>played at classical timecontrols?  That was really one of the things I wanted to
>look at but clearly it requires a pool of such games, consistent hardware, etc.

I think you need to do something in 60 minutes at least, plus some sort of
secondary time control or increment.

>I ask this because the long timecontrol devotees have spare hardware, or at
>least hardware over which they exercise an enormous amount of discretion as to
>its use.  Not all of us are in that fortunate position.
>Playing 200 games or more at 60 minutes + (which is still fast chess!) would
>take me to a place where the light does not shine...
>I am thinking that there may be a relationship - particularly as the subject is
>an electronic construct - between long games and short ones.  It may not be
>linear but I cannot believe that it is a coincidence that the long timecontrol
>GMs are also atop the blitz ratings ladder...

If you look however, you will see an IM win the blitz events on ICC or
at other places, because blitz is simply a different game.

>(2)  What is the statistical minimum of games that I would have to play to be
>able to make some sort of definitive noise?

This depends on the strength of the two players.  The wider the gap, the
fewer games you need to play.  An easy example is to pick two players on ICC
and search for all games between them.  Pick one player's perspective and
record a win as 1, a dra as .5 and a loss as 0.  After you do a few hundred
such games, look at the string of results.  Do you see a consecutive
group you could pick that shows A to be stronger?  Another group that would
show B to be stronger?  That is what is wrong with a small sample-size.  You
might just start off at the front of either of those two groups, and if you
stop too soon, you get a biased result.

>(3)  What is the impact - or theoretical impact - of learning on such a match?
>My personal bias is that if an author implements learning he should be rewarded
>for it and it should be turned on at the beginning of the match.  This speaks to
>positional and book learning.

Between two programs, it can be very significant.  But you can answer this
with experimentation, A vs B with learning on, then A with learning on vs
B with learning off.

>(4)  I am also biased towards using the engine's particular book(s).  The
>opening knowledge that a human chessplayer has is his/hers.  An engine should
>have its own book with it as it goes into battle.  Can someone turn off Ms.
>Polgar's opening book?  No?  Then the engine should have its book too....

I agree, but it is not so simple when you think about it.  A hand-crafted
book is a powerful force.  And often the same book is used by multiple
chess engines.  That also doesn't seem so reasonable.

>(5)  The games would be played on my single processor CPU.  That would mean no
>pondering *if* I understand Robert Hyatt's reasoning on the matter (which I
>freely admit may not be the case at all!).

I think that is best, yes.  But ponder=on can also be perfectly reasonable,
in the Case of Crafty.  It simply always ponders unless it has found a mate
or is in an endgame table.

>(6)  Are there any other factors?
>I really would like a way to prove or disprove the position that:
>(1) Games at shorter timecontrols are essentially worthless and:
>(2) That matches of 1000 games are required to make statistical statements.
>Please feel free to comment BUT what I would really like are some answers to the
>above questions and/or pointers....

This page took 0.01 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.