Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Dummy Cadaques Tournament (Long)

Author: Amir Ban

Date: 13:36:57 01/28/00

Go up one level in this thread


On January 28, 2000 at 04:07:44, Christophe Theron wrote:

<snip>

>I have just run it. My sample is 1000 matches. Each match is made of 200 games.
>My program tells me that with 200 games I can only be sure that one program is
>stronger if the elo difference of the two is above 35 elo points, and this is
>sure with a 93.5% confidence.
>
>If the programs are closer than 35 elo points, 200 games are not enough to be
>sure which is best.
>
>Number of matches: 1000
>Number of games in each match: 200
>Compute probability of error greater than: 5
>
>
>
>    Christophe
>
>

Something wrong with the numbers here: 200x1000 games are good enough to
establish a rating with 95% confidence margib of 1.5 points. If two programs are
35 points apart, you would need only about 400 games to say tell with 95%
confidence which is better.

This also fails to say something important: The greater the difference in
strength, the less games needed to prove who is better. If players are 100
points apart, only about 50 games are needed. A 200 point difference would show
up almost immediately.

I think there's also a logical trap than even the smartest fall into. When
people see for example the SSDF list, and see their 95% confidence intervals,
they jump to the conclusion that if the point spread is within this interval, it
has NO significance, which is not true. I can very well make statements based on
only 80% (gasp!) probablility. I expect to be right 80% of the time, and in most
cases I will pass for a very smart person.

Amir





This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.