Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: I just don't get this ...

Author: Mark Young

Date: 00:32:28 01/04/04

Go up one level in this thread


On January 04, 2004 at 02:52:18, Kurt Utzinger wrote:

>On January 03, 2004 at 20:53:39, Rick Rice wrote:
>
>>Person A posts a message saying Ruffian 2.0 is very dissapointing, with the
>>results to back it up. This is followed by a second post which basically says
>>that Ruffian 2.0 rocks with some results to back it up. Are these programs
>>really so time and hardware sensitive, so as to show varying results on
>>different CPUs/time controls?
>>
>>Ideal solution would be for SSDF to have one massive board with one CPU and
>>memory for each program (equal CPU and mem for all the progs on its list) and
>>some way to automate the play of these programs against each other..... on
>>different time controls such as regular, blitz etc. Just wishful thinking for
>>the future, but it would eliminate the multiple and varying results.
>>
>>Cheers,
>>Rick
>
>It's indeed most important to play enough games to get an objective impression
>about any program. I give below an example of a match [40'/40] I have played
>over 100 games between Gandalf 4.32g and Program_X [I am a beta tester of X] to
>show what I mean:
>
>Gandalf 4.32g vs Program X
>
>Games 1-10
>3.0-7.0 [win program X]
>Total 3.0-7.0 for program X
>
>Games 11-20
>6.5-3.5 [win Gandalf]
>Total 9.5-10.5 for program X
>
>Games 21-30
>5.0-5.0 [draw]
>Total 14.5-15.5 for program X
>
>Games 31-40
>3.5-6.5 [win program X]
>Total 18.0-22.0 for program X
>
>Games 41-50
>4.5-5.5 [win program X]
>Total 22.5-27.5 for program X
>
>Games 51-60
>3.0-7.0 [win program X
>Total 25.5-34.5 for program X
>
>Games 61-70
>5.0-5.0 [draw]
>Total 30.5-39.5 for program X
>
>Games 71-80
>8.0-2.0 [win Gandalf]
>Total 38.5-41.5 for program X
>
>Games 81-90
>7.0-3.0 [win Gandalf]
>Total 45.5-44.5 for Gandalf
>
>Games 91-100
>5.5-4.5 [win Gandalf]
>Final match result 51.0-49.0 for Gandalf
>
>Can anybody tell me for sure which of the above two is the stronger program??

At what percentage of certainty. Since you can never be 100 % sure. Even if you
play your 200 to 300 games. You still may not be albe to have high confidence.
More games are always good, but that is not the whole story.

There is a 53% chance that Gandalf is better then Program X. :)

>And what about if I had only played a 20 games match and these games would have
>been those played in rounds 71-90? Then, the result would have been 15.0-5.0 in
>favour of Gandalf 4.32g!! Imagine what some testers would have argued about the
>strenght of program X?
>
>For all these reasons I think that something concrete about the strength between
>two programs can only be said if 100, better 200-300 games or even more have
>been played.



>
>Kurt



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.