A question about statistics...

Mark Young

09:52:46 01/04/04

On January 04, 2004 at 12:40:00, Ricardo Gibert wrote:

On January 04, 2004 at 12:29:15, Mark Young wrote:
On January 04, 2004 at 11:46:00, Roger Brown wrote:
Hello all,
>>>I have read numerous posts about the validity - or lack thereof actually - of
>>>short matches between and among chess engines.  The arguments of those who say
>>>that such matches are meaningless (Kurt Utzinger, Christopher Theron, Robert
>>>Hyatt et al)typically indicate that well over 200 games are requires to make any
>>>sort of statisticdal statement that engine X is better than engine Y.
>>>I concede this point.
>>If you concede this point you don't understand. There is no magic number like
>>200 or 2000. The score must be considered. Here is an example:
>>
>>A score of 17 - 3 in a 20 game match has a certainty of over 99% that the winner
>>of the match is stronger then the loser.
>>
>>A 100 game match ending 55 - 45 only has a 81% chance that the winner of the
>>match is the stronger program.
>>
>>A 200 game match ending 106 - 94 only has a 78 % chance that the winner is
>>stronger then the loser.
>Nothing you have said is really correct because you have ignored the significant
>effect of draws in a match.

I can only say WHAT!! The last time I checked wins count as 1 point, draws count
as 1/2 point, and loses count as 0.

So I have no clue what is going on in your brain to make such a comment!!

In a 20 game match winning with 17 wins and 3 losses  0 draws is equal to
winning with 14 wins 0 losses and 6 draws. You win both matches 17 - 3. The
results are one in the same.

>>>The arguments of the short match exponents typically centre on other
>>>chessplaying characteristics of an engine which may also be of  interest to a
>>>user - tactical excitement, daring, amazing moves, positional considerations,
>>>human like play etc.
>>>(1)  Is there a minimum timecontrol that is satistically relevant to games
>>>played at classical timecontrols?  That was really one of the things I wanted to
>>>look at but clearly it requires a pool of such games, consistent hardware, etc.
>>>I ask this because the long timecontrol devotees have spare hardware, or at
>>>least hardware over which they exercise an enormous amount of discretion as to
>>>its use.  Not all of us are in that fortunate position.
>>>Playing 200 games or more at 60 minutes + (which is still fast chess!) would
>>>take me to a place where the light does not shine...
>>>
>>>I am thinking that there may be a relationship - particularly as the subject is
>>>an electronic construct - between long games and short ones.  It may not be
>>>linear but I cannot believe that it is a coincidence that the long timecontrol
>>>GMs are also atop the blitz ratings ladder...
>>>(2)  What is the statistical minimum of games that I would have to play to be
>>>able to make some sort of definitive noise?
>>>(3)  What is the impact - or theoretical impact - of learning on such a match?
>>>My personal bias is that if an author implements learning he should be rewarded
>>>for it and it should be turned on at the beginning of the match.  This speaks to
>>>positional and book learning.
>>>
>>>(4)  I am also biased towards using the engine's particular book(s).  The
>>>opening knowledge that a human chessplayer has is his/hers.  An engine should
>>>have its own book with it as it goes into battle.  Can someone turn off Ms.
>>>Polgar's opening book?  No?  Then the engine should have its book too....
>>>(5)  The games would be played on my single processor CPU.  That would mean no
>>>pondering *if* I understand Robert Hyatt's reasoning on the matter (which I
>>>freely admit may not be the case at all!).
>>>I really would like a way to prove or disprove the position that:
>>>
>>>(1) Games at shorter timecontrols are essentially worthless and:
>>>
>>>(2) That matches of 1000 games are required to make statistical statements.
>>>Please feel free to comment BUT what I would really like are some answers to the
>>>above questions and/or pointers....
>>>Later.

