Author: Christophe Theron
Date: 13:43:04 07/05/01
Go up one level in this thread
On July 05, 2001 at 07:45:09, Harald Faber wrote:
>On July 05, 2001 at 06:10:27, Torstein Hall wrote:
>
>>On July 05, 2001 at 04:20:40, Harald Faber wrote:
>>
>>>A disappointing result for the GambitTiger fans. Such a clear and justified loss
>>>hasn't happened before. I wouldn't explain the 2.5-7.5 with statistical margain.
>>>Looking at the games, I'd expect another win of Shredder if the match would be
>>>repeated. Maybe not so high, but a 6-4 would fit my forecast. Shredder really
>>>played some fine games, see yourself. You find the games at
>>>http://www.geocities.com/Harald1312/HaraldFaberE.html.
>>>
>>>Next match is versus Hiarcs 7.32 who was already very hard to beat by
>>>ChessTiger. The first game ended in a draw, in the second one both programs show
>>>a +2 in favour of Hiarcs, but it is a rook and pawn ending so maybe GambitTiger
>>>can reach a draw.
>>
>>10 games is not enough games to say anything for sure about program strenght!
>
>I know. But take a look at the games, don't you agree that Shredder played
>convincingly?
Take only the won games of a program and you are always going to think it won
convincingly.
It does not apply only to Tiger or Shredder, it's a general rule.
> And of course I wouldn't dare speaking of Shredder being stronger
>than GambitTiger after only 10 games. I would at least take 50 games to be on
>the right trip.
>
>>I'm not sure how many games you would need to 99% sure that one program is
>>stronger than another. Perhaps 2-300?
>>
>>Torstein
>
>Don't know exactly, but 50-100 should be enough to get a good approximation. I'd
>be interested in a statistic that shows the average gain/loss of ELO in the SSDF
>between 100 and 300 or 500 games.
I use the following tables. These tables are not very accurate because they
assume 1/3 chances for win/draw/losses, but they give you a good approximation.
At least it is better than only guessing.
Reliability of chess matches
(assuming each opponent has 1/3 chances to win, 1/3 to loose and 1/3 to draw)
90% confidence
Games %err+/- elo+/-
10 20 140pts
20 15 105pts
25 14 98pts
30 12 63pts
40 10 70pts
50 9 56pts
100 6.5 35pts
200 4.72 33pts
400 3.34 23pts
600 2.66 19pts
800 2.39 17pts
1000 2.12 15pts
1200 2.00 14pts
1400 1.81 13pts
1600 1.66 12pts
80% confidence
Games %err+/- elo+/-
10 15 105pts
20 11 77pts
25 10 70pts
30 9 63pts
40 8 56pts
50 7 49pts
100 5.0 35pts
200 3.75 26pts
400 2.60 18pts
600 2.15 15pts
800 1.86 13pts
1000 1.66 12pts
1200 1.46 10pts
1400 1.40 10pts
1600 1.34 9pts
70% confidence
Games %err+/- elo+/-
10 15 105pts
20 10 70pts
25 8 56pts
30 8 56pts
40 6.3 44pts
50 6.0 42pts
100 4.0 28pts
200 3.0 21pts
400 2.2 15pts
600 1.7 12pts
800 1.5 11pts
1000 1.3 9pts
1200 1.24 9pts
1400 1.14 8pts
1600 1.04 7pts
So a result of 7.5-2.5 is 75%, and with 90% confidence you can say that it is
significant (50%+20% which is the margin of error for 90% confidence on 10 games
match gives 70%, and as the actual result is above that the match has
statistical significance).
However this is only 90% confidence, so by saying that the result clearly shows
that one program is better than the other one, you'll end up being wrong in 10%
of the cases.
If you assume that Tiger and Shredder are 70 elo points apart or less, then you
need to play AT LEAST 40 games to show it with 90% confidence.
If you assume they are 35 elo points apart or less, you'll need AT LEAST 100
games (if you want 90% confidence).
To differentiate a 23 elo points interval, be ready to play 400 games.
And so on... just pick the line you wish in the tables.
Christophe
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.