Author: Kurt Utzinger
Date: 23:52:18 01/03/04
Go up one level in this thread
On January 03, 2004 at 20:53:39, Rick Rice wrote: >Person A posts a message saying Ruffian 2.0 is very dissapointing, with the >results to back it up. This is followed by a second post which basically says >that Ruffian 2.0 rocks with some results to back it up. Are these programs >really so time and hardware sensitive, so as to show varying results on >different CPUs/time controls? > >Ideal solution would be for SSDF to have one massive board with one CPU and >memory for each program (equal CPU and mem for all the progs on its list) and >some way to automate the play of these programs against each other..... on >different time controls such as regular, blitz etc. Just wishful thinking for >the future, but it would eliminate the multiple and varying results. > >Cheers, >Rick It's indeed most important to play enough games to get an objective impression about any program. I give below an example of a match [40'/40] I have played over 100 games between Gandalf 4.32g and Program_X [I am a beta tester of X] to show what I mean: Gandalf 4.32g vs Program X Games 1-10 3.0-7.0 [win program X] Total 3.0-7.0 for program X Games 11-20 6.5-3.5 [win Gandalf] Total 9.5-10.5 for program X Games 21-30 5.0-5.0 [draw] Total 14.5-15.5 for program X Games 31-40 3.5-6.5 [win program X] Total 18.0-22.0 for program X Games 41-50 4.5-5.5 [win program X] Total 22.5-27.5 for program X Games 51-60 3.0-7.0 [win program X Total 25.5-34.5 for program X Games 61-70 5.0-5.0 [draw] Total 30.5-39.5 for program X Games 71-80 8.0-2.0 [win Gandalf] Total 38.5-41.5 for program X Games 81-90 7.0-3.0 [win Gandalf] Total 45.5-44.5 for Gandalf Games 91-100 5.5-4.5 [win Gandalf] Final match result 51.0-49.0 for Gandalf Can anybody tell me for sure which of the above two is the stronger program?? And what about if I had only played a 20 games match and these games would have been those played in rounds 71-90? Then, the result would have been 15.0-5.0 in favour of Gandalf 4.32g!! Imagine what some testers would have argued about the strenght of program X? For all these reasons I think that something concrete about the strength between two programs can only be said if 100, better 200-300 games or even more have been played. Kurt
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.