Computer Chess Club Archives

Search

Terms

Messages

Subject: Re: Dummy Cadaques Tournament (Long)

Author: Christophe Theron

Date: 14:11:17 01/28/00

On January 28, 2000 at 16:36:57, Amir Ban wrote:

>On January 28, 2000 at 04:07:44, Christophe Theron wrote:
>
><snip>
>
>>I have just run it. My sample is 1000 matches. Each match is made of 200 games.
>>My program tells me that with 200 games I can only be sure that one program is
>>stronger if the elo difference of the two is above 35 elo points, and this is
>>sure with a 93.5% confidence.
>>
>>If the programs are closer than 35 elo points, 200 games are not enough to be
>>sure which is best.
>>
>>Number of matches: 1000
>>Number of games in each match: 200
>>Compute probability of error greater than: 5
>>
>>
>>
>>    Christophe
>>
>>
>
>Something wrong with the numbers here: 200x1000 games are good enough to
>establish a rating with 95% confidence margib of 1.5 points.


But this is not what I computed.

I computed the average error margin of a 200 games match, by simulating 1000 of
such matches.

The experimental result I get with this simple program is that with 80%
confidence the error margin is below of equal to 3.5% in 200 games matches.

Does it fit with your own numbers? I'm interested in this.




> If two programs are
>35 points apart, you would need only about 400 games to say tell with 95%
>confidence which is better.


I have not tried to establish the table for 95% confidence, but your numbers
sound OK for me.




>This also fails to say something important: The greater the difference in
>strength, the less games needed to prove who is better. If players are 100
>points apart, only about 50 games are needed. A 200 point difference would show
>up almost immediately.


If you run my program for a while, it quickly becomes obvious.


    Christophe



>I think there's also a logical trap than even the smartest fall into. When
>people see for example the SSDF list, and see their 95% confidence intervals,
>they jump to the conclusion that if the point spread is within this interval, it
>has NO significance, which is not true. I can very well make statements based on
>only 80% (gasp!) probablility. I expect to be right 80% of the time, and in most
>cases I will pass for a very smart person.
>
>Amir

This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.