Author: Mark Young

Date: 09:29:15 01/04/04

On January 04, 2004 at 11:46:00, Roger Brown wrote: >Hello all, > >I have read numerous posts about the validity - or lack thereof actually - of >short matches between and among chess engines. The arguments of those who say >that such matches are meaningless (Kurt Utzinger, Christopher Theron, Robert >Hyatt et al)typically indicate that well over 200 games are requires to make any >sort of statisticdal statement that engine X is better than engine Y. > >I concede this point. If you concede this point you don't understand. There is no magic number like 200 or 2000. The score must be considered. Here is an example: A score of 17 - 3 in a 20 game match has a certainty of over 99% that the winner of the match is stronger then the loser. A 100 game match ending 55 - 45 only has a 81% chance that the winner of the match is the stronger program. A 200 game match ending 106 - 94 only has a 78 % chance that the winner is stronger then the loser. > >The arguments of the short match exponents typically centre on other >chessplaying characteristics of an engine which may also be of interest to a >user - tactical excitement, daring, amazing moves, positional considerations, >human like play etc. > >I also agree that this camp has a valid perspective. > >I would like to conduct an experiment but I need to ask a few questions first: > >(1) Is there a minimum timecontrol that is satistically relevant to games >played at classical timecontrols? That was really one of the things I wanted to >look at but clearly it requires a pool of such games, consistent hardware, etc. > >I ask this because the long timecontrol devotees have spare hardware, or at >least hardware over which they exercise an enormous amount of discretion as to >its use. Not all of us are in that fortunate position. > >Playing 200 games or more at 60 minutes + (which is still fast chess!) would >take me to a place where the light does not shine... > >I am thinking that there may be a relationship - particularly as the subject is >an electronic construct - between long games and short ones. It may not be >linear but I cannot believe that it is a coincidence that the long timecontrol >GMs are also atop the blitz ratings ladder... > > >(2) What is the statistical minimum of games that I would have to play to be >able to make some sort of definitive noise? > > >(3) What is the impact - or theoretical impact - of learning on such a match? >My personal bias is that if an author implements learning he should be rewarded >for it and it should be turned on at the beginning of the match. This speaks to >positional and book learning. > > >(4) I am also biased towards using the engine's particular book(s). The >opening knowledge that a human chessplayer has is his/hers. An engine should >have its own book with it as it goes into battle. Can someone turn off Ms. >Polgar's opening book? No? Then the engine should have its book too.... > > >(5) The games would be played on my single processor CPU. That would mean no >pondering *if* I understand Robert Hyatt's reasoning on the matter (which I >freely admit may not be the case at all!). > > >(6) Are there any other factors? > > > >I really would like a way to prove or disprove the position that: > >(1) Games at shorter timecontrols are essentially worthless and: > >(2) That matches of 1000 games are required to make statistical statements. > > >Please feel free to comment BUT what I would really like are some answers to the >above questions and/or pointers.... > > > >Later.

