Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Shredder crushing Chess Tiger.

Author: Alberto Rezza

Date: 01:54:16 12/16/03

Go up one level in this thread


On December 15, 2003 at 01:25:40, Christophe Theron wrote:

>I think you are not going to like the answer. :)
>
>It depends on:
>* the reliability you want (do you want a 70% reliability? 80%? 90%? 95%?)
>* the elo difference between the programs
>
>If you want a very good reliability in the result (for example 95%) and the two
>programs are very close in elo, then you might need several thousands games.

There is a fatal flaw in your argument: if you already know the Elo difference,
then there is no point at all in playing a test match...

You should have made it depend on:
* the reliability you want
* the actual result of the match (N points out of M games)

and from this you can draw conclusions like "Elo difference >50 points with 95%
confidence" or (if you are a much better statistician than I am) even compute a
continuous distribution of probability for the Elo difference.

In other words: 5 games may be too few, but there is a big difference between
16-14 and 30-0. If you get a 30-0 result, the hint is: maybe the programs are
NOT very close in Elo, so you do NOT need thousands of games.. :)

Alberto

BTW: if you open Whoisbetter, the default params are 95% and 0 points for the
loser; and the minimum number of games is given as 5 (result: 5-0) for 95%
confidence. What a coincidence :)



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.