Author: George Speight
Date: 14:37:47 08/19/05
Go up one level in this thread
On August 19, 2005 at 09:27:22, Kurt Utzinger wrote: >On August 19, 2005 at 09:12:55, tito wrote: > >>I began a match between Shredder9 and Toga2beta with NUNN2 and the first >>results leave me perplexed: 4/5 for Shredder. Is that possible? > > What about > - time control used > - hardware > and furthermore have a look at > > CEGT comments by Heinz van Kempen > A lot of games are required to come to any conclusions > about playing strength of an engine > [http://www.chessfighters.de/cegt/html/comment_1.html] > [http://www.chessfighters.de/cegt/html/comment_3.html] > > And finally another good example from my own experience > see the message below: > Kurt > >You have still not played enough games. I give below an example of a match >[40'/40] I have played over 100 games between Gandalf 4.32g and Program_X [I am >a beta tester of X] to show what I mean: > >Gandalf 4.32g vs Program X > >Games 1-10 >3.0-7.0 [win program X] >Total 3.0-7.0 for program X > >Games 11-20 >6.5-3.5 [win Gandalf] >Total 9.5-10.5 for program X > >Games 21-30 >5.0-5.0 [draw] >Total 14.5-15.5 for program X > >Games 31-40 >3.5-6.5 [win program X] >Total 18.0-22.0 for program X > >Games 41-50 >4.5-5.5 [win program X] >Total 22.5-27.5 for program X > >Games 51-60 >3.0-7.0 [win program X >Total 25.5-34.5 for program X > >Games 61-70 >5.0-5.0 [draw] >Total 30.5-39.5 for program X > >Games 71-80 >8.0-2.0 [win Gandalf] >Total 38.5-41.5 for program X > >Games 81-90 >7.0-3.0 [win Gandalf] >Total 45.5-44.5 for Gandalf > >Games 91-100 >5.5-4.5 [win Gandalf] >Final match result 51.0-49.0 for Gandalf > >Can anybody tell me for sure which of the above two is the stronger program?? >And what about if I had only played a 20 games match and these games would have >been those played in rounds 71-90? Then, the result would have been 15.0-5.0 in >favour of Gandalf 4.32g!! Imagine what some testers would have argued about the >strenght of program X? > >For all these reasons I think that something concrete about the strength between >two programs can only be said if 100, better 200-300 games or even more have >been played. Kurt, i have never seen it explained better. Your point is well-made, as usual. Unfortunately, after 300 games, there will be those who say thats not enough, we need 600 games,etc. Where would it end? My one wish is there could be a match length that could actually become the standard that most would agree on. Regards, George
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.