Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Shredder crushing Chess Tiger.

Author: Andrew Dados

Date: 07:24:45 12/15/03

Go up one level in this thread


On December 15, 2003 at 01:25:40, Christophe Theron wrote:

>On December 14, 2003 at 19:26:30, J F wrote:
>
>>Christophe, How many games do you recomend playing before you can draw a
>>conclusion?
>
>
>
>I think you are not going to like the answer. :)
>
>It depends on:
>* the reliability you want (do you want a 70% reliability? 80%? 90%? 95%?)
>* the elo difference between the programs
>
>If you want a very good reliability in the result (for example 95%) and the two
>programs are very close in elo, then you might need several thousands games.
>
>There is no simple answer to your question. However, I know that there exist a
>program called "whoisbetter" that can, given a match result, tell you if one
>program can be considered better than his opponent.
>
>The very important thing to remember is that in order to know which of the top
>PC chess programs is better, you will definitely need several thousands of
>games, believe it or not. So it's always funny to see somebody giving an opinion
>after 5 games.
>
>
>Below is a table that can be used to get an idea of the number of games to play
>to get a given error margin (in winning percentage and in elo difference) for a
>given reliability (percentage of confidence).
>
>The tables say that, for example, if you want to know with 90% reliability which
>opponent is better you will have to play 1000 games if their elo difference is
>15 points. If their elo difference is below 10 points, you will have to play
>more than 2000 games...
>
>Reliability of chess matches
>
>90% confidence
>Games    %err+/-    elo+/-
>    10     20        140pts
>    20     15        105pts
>    25     14         98pts
>    30     12         63pts
>    40     10         70pts
>    50      9         56pts
>   100      6.5       35pts
>   200      4.72      33pts
>   400      3.34      23pts
>   600      2.66      19pts
>   800      2.39      17pts
>  1000      2.12      15pts
>  1200      2.00      14pts
>  1400      1.81      13pts
>  1600      1.66      12pts
>  2000     ~1.50      11pts
>
>80% confidence
>Games    %err+/-    elo+/-
>    10     15        105pts
>    20     11         77pts
>    25     10         70pts
>    30      9         63pts
>    40      8         56pts
>    50      7         49pts
>   100      5.0       35pts
>   200      3.75      26pts
>   400      2.60      18pts
>   600      2.15      15pts
>   800      1.86      13pts
>  1000      1.66      12pts
>  1200      1.46      10pts
>  1400      1.40      10pts
>  1600      1.34       9pts
>
>70% confidence
>Games    %err+/-    elo+/-
>    10     15         105pts
>    20     10          70pts
>    25      8          56pts
>    30      8          56pts
>    40      6.3        44pts
>    50      6.0        42pts
>   100      4.0        28pts
>   200      3.0        21pts
>   400      2.2        15pts
>   600      1.7        12pts
>   800      1.5        11pts
>  1000      1.3         9pts
>  1200      1.24        9pts
>  1400      1.14        8pts
>  1600      1.04        7pts
>
>
>
>    Christophe

I always wondered how those tables are calculated. Since we have no model which
includes draw scores and draw possibilities in any satisfactory way all those
tables are just guessed (or most likely draw score possibilities are just
ignored).

If draws and their chances are ignored, divide games column number by 2 is best
guess - each chess game has 3 outcomes, not 2, so every game equals to 2 coin
tosses not one (roughly, draw percent depends on opponent strength and this is
the problem here: we don't know what is expected percent of draw games).

whoisbetter is one example of statistic ignoring one of 3 possible scores (it
comes to extreme), and thus produces incorrect probabilities.

-Andrew-



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.