Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Shredder 9 vs Toga2 Beta1 / Nunn2

Author: George Speight

Date: 14:37:47 08/19/05

Go up one level in this thread


On August 19, 2005 at 09:27:22, Kurt Utzinger wrote:

>On August 19, 2005 at 09:12:55, tito wrote:
>
>>I began a match between Shredder9 and Toga2beta with  NUNN2 and the first
>>results leave me perplexed:  4/5 for Shredder. Is that possible?
>
>       What about
>       - time control used
>       - hardware
>       and furthermore have a look at
>
>       CEGT comments by Heinz van Kempen
>       A lot of games are required to come to any conclusions
>       about playing strength of an engine
>       [http://www.chessfighters.de/cegt/html/comment_1.html]
>       [http://www.chessfighters.de/cegt/html/comment_3.html]
>
>       And finally another good example from my own experience
>       see the message below:
>       Kurt
>
>You have still not played enough games. I give below an example of a match
>[40'/40] I have played over 100 games between Gandalf 4.32g and Program_X [I am
>a beta tester of X] to show what I mean:
>
>Gandalf 4.32g vs Program X
>
>Games 1-10
>3.0-7.0 [win program X]
>Total 3.0-7.0 for program X
>
>Games 11-20
>6.5-3.5 [win Gandalf]
>Total 9.5-10.5 for program X
>
>Games 21-30
>5.0-5.0 [draw]
>Total 14.5-15.5 for program X
>
>Games 31-40
>3.5-6.5 [win program X]
>Total 18.0-22.0 for program X
>
>Games 41-50
>4.5-5.5 [win program X]
>Total 22.5-27.5 for program X
>
>Games 51-60
>3.0-7.0 [win program X
>Total 25.5-34.5 for program X
>
>Games 61-70
>5.0-5.0 [draw]
>Total 30.5-39.5 for program X
>
>Games 71-80
>8.0-2.0 [win Gandalf]
>Total 38.5-41.5 for program X
>
>Games 81-90
>7.0-3.0 [win Gandalf]
>Total 45.5-44.5 for Gandalf
>
>Games 91-100
>5.5-4.5 [win Gandalf]
>Final match result 51.0-49.0 for Gandalf
>
>Can anybody tell me for sure which of the above two is the stronger program??
>And what about if I had only played a 20 games match and these games would have
>been those played in rounds 71-90? Then, the result would have been 15.0-5.0 in
>favour of Gandalf 4.32g!! Imagine what some testers would have argued about the
>strenght of program X?
>
>For all these reasons I think that something concrete about the strength between
>two programs can only be said if 100, better 200-300 games or even more have
>been played.
Kurt, i have never seen it explained better. Your point is well-made, as usual.
Unfortunately, after 300 games, there will be those who say thats not enough, we
need 600 games,etc. Where would it end? My one wish is there could be a match
length that could actually become the standard that most would agree on.
Regards, George



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.