Author: George Tsavdaris
Date: 14:23:21 10/16/03
Go up one level in this thread
On October 16, 2003 at 15:13:11, Russell Reagan wrote: >On October 16, 2003 at 14:10:48, Jorge Pichard wrote: > >>Testing a newer version a program against a previous is useless. Remember when >>we tested Fritz7 versus Fritz8 it gave us the false impression that Fritz 8 was >>NOT better, but when it was tested by the SSDF against other programs the >>difference was noticeable. > >I once played 8 different copies of the SAME program against each other and they >had pretty different scores. Obviously the same program is not weaker than >itself. Maybe most of the tournaments results we see are worthless, since they >do not approach statistical reliability. There is no such thing as statistical reliability. For that we have to play an infinite number of games. Of course we can define a number like 0.95 or 0.99, for the statistical reliability a result must reach, to be satisfied and say that engine A is stronger than B. So every tournament is not wothless but only less reliable from another.
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.