Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Mac Hiarcs swats away CT2004: 60+30 10 games

Author: George Sobala

Date: 02:44:44 01/29/05

Go up one level in this thread


On January 29, 2005 at 05:34:24, Kurt Utzinger wrote:

>Hi George
>My believe in statistics on the basis of less data
>has broken a long time ago. And I once again refer
>to a good example of how computer matches can develop:
>
>I give below an example of a match [40'/40] I have played over 100 Games between
>Gandalf 4.32g and Program_X [I am a beta tester of X] to show what I mean:
>
>Gandalf 4.32g vs Program X
>
>Games 1-10
>3.0-7.0 [win program X]
>Total 3.0-7.0 for program X
>
>Games 11-20
>6.5-3.5 [win Gandalf]
>Total 9.5-10.5 for program X
>
>Games 21-30
>5.0-5.0 [draw]
>Total 14.5-15.5 for program X
>
>Games 31-40
>3.5-6.5 [win program X]
>Total 18.0-22.0 for program X
>
>Games 41-50
>4.5-5.5 [win program X]
>Total 22.5-27.5 for program X
>
>Games 51-60
>3.0-7.0 [win program X
>Total 25.5-34.5 for program X
>
>Games 61-70
>5.0-5.0 [draw]
>Total 30.5-39.5 for program X
>
>Games 71-80
>8.0-2.0 [win Gandalf]
>Total 38.5-41.5 for program X
>
>Games 81-90
>7.0-3.0 [win Gandalf]
>Total 45.5-44.5 for Gandalf
>
>Games 91-100
>5.5-4.5 [win Gandalf]
>Final match result 51.0-49.0 for Gandalf
>
>Can you tell me for sure which of the above two is the stronger program?? And
>what about if I had only played a 20 Games match and these games would have been
>those played in rounds 71-90? Then, the result would have been 15.0-5.0 in
>favour of Gandalf 4.32g!! Imagine what some testers would have argued about the
>strenght of program X?
>
>For all these reasons I think that something concrete about the strength between
>two programs can only be said if 100, better 200-300 Games or even more have
>been played.
>
>Kurt

Firstly, you are not giving the actual +/- scores for each group of 10 games.
But even if X was +4 -0 =6 in the first 10 games this gave it less than 90%
chance of being superior on that limited initial data.

Secondly, I have all the data, I will use all the data. It would be stupid to
home in onto any one group of 10 games if the whole series is available.

So it does not seem in that series that Program X was ever superior to Gandalf
at greater than 90% probability level.

You have one very extreme result in that series: 8-2. One would expect about one
result like this in a series like this, so that is ok. Nothing to get excited
about if it is in the middle of a whole load of other data. Different if it
happened to be the first ten games and you don't know what is going to happen
next.



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.