Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Testing a newer version a program against a previous is misleading

Author: George Tsavdaris

Date: 14:23:21 10/16/03

Go up one level in this thread


On October 16, 2003 at 15:13:11, Russell Reagan wrote:

>On October 16, 2003 at 14:10:48, Jorge Pichard wrote:
>
>>Testing a newer version a program against a previous is useless. Remember when
>>we tested Fritz7 versus Fritz8 it gave us the false impression that Fritz 8 was
>>NOT better, but when it was tested by the SSDF against other programs the
>>difference was noticeable.
>
>I once played 8 different copies of the SAME program against each other and they
>had pretty different scores. Obviously the same program is not weaker than
>itself. Maybe most of the tournaments results we see are worthless, since they
>do not approach statistical reliability.

 There is no such thing as statistical reliability. For that we have to play an
infinite number of games. Of course we can define a number like 0.95 or 0.99,
for the statistical reliability a result must reach, to be satisfied and say
that engine A is stronger than B.
 So every tournament is not wothless but only less reliable from another.



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.