Author: Christophe Theron
Date: 09:08:16 12/19/00
Go up one level in this thread
On December 19, 2000 at 04:13:17, Severi Salminen wrote:
>>>That begins to sound interesting. 200 games match still has some error margins
>>>but we'll see a lot from that result. I'm looking forward for the results - not
>>>too often someone runs a 200+ match here in CCC, thanks!
>>>
>>>Severi
>>
>>
>>
>>On 200 games, the margin of error for 80% reliability is +/-3.5%.
>>For 70% reliability it's +/-3.0%.
>>
>>If a program wins the 200 games match by 53.5% (107-93) or more, you can say
>>with 80% relability that it is stronger than its opponent.
>>
>>If it wins by only 53% (106-94) you can say it is better, but only with 70%
>>reliability.
>>
>>You see that when the programs are very close you need a very large number of
>>games to determine which is the best.
>
>Yes, that's what I also stated to Jorge. It would be great to have some table
>online (here at CCC) showing these error margins. Then it would be easier to
>convince people that if Fritz beats Crafty 4-0 it probably doesn't tell the
>whole truth...
Here is a table I have computed myself. I have already published it here and
asked for people to check it, but got no answer.
----------------------------------------------
Reliability of chess matches (confidence: 80%)
10 games: 14.0% (105 pts)
20 games: 11.0% ( 77 pts)
30 games: 9.0% ( 63 pts)
40 games: 8.0% ( 56 pts)
50 games: 7.0% ( 49 pts)
100 games: 5.0% ( 35 pts)
200 games: 3.5% ( 25 pts)
400 games: 2.5% ( 18 pts)
600 games: 2.2% ( 15 pts)
----------------------------------------------
Which reads as follows (I take the 50 games line as an example):
1) if you play 50 games you need a winning percentage of 50+7.0=57.0%
(28.5-21.5) or higher to say that the winner is stronger than its opponent with
80% confidence.
2) if the elo difference of two opponents is below 49, then 50 games are not
enough to say which one is the best with 80% confidence.
But remember that 80% confidence means that you can be wrong one time out of 5
when you interpret results.
>>On the other hand, if there is a significant difference before you reach 200
>>games, it is possible to say which is the best without playing the 200 games.
>
>I wouldn't stop unless the difference is _very_ big. 20-0 is not enough because
>the small number of games played.
For 20 games, a 61% winning percentage is enough, with 80% confidence.
That means that a result of 12.5-7.5 is already significant, with 80%
confidence.
> It is hard to say what is the significant
>difference because that depends on number of games (like the reliability of the
>match).
The table above tells you which results are significant depending on the number
of games.
> But of course if one program is leading 150-0 in a 200 games match one
>could end it saying A won at least 150-50. Here we have probably a quite big
>difference in strenght.
Yes.
Christophe
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.