Computer Chess Club Archives

Search

Terms

Messages

Subject: Re: Shredder crushing Chess Tiger.

Author: Christophe Theron

Date: 09:39:26 12/15/03

On December 15, 2003 at 10:24:45, Andrew Dados wrote:

>On December 15, 2003 at 01:25:40, Christophe Theron wrote:
>
>>On December 14, 2003 at 19:26:30, J F wrote:
>>
>>>Christophe, How many games do you recomend playing before you can draw a
>>>conclusion?
>>
>>
>>
>>I think you are not going to like the answer. :)
>>
>>It depends on:
>>* the reliability you want (do you want a 70% reliability? 80%? 90%? 95%?)
>>* the elo difference between the programs
>>
>>If you want a very good reliability in the result (for example 95%) and the two
>>programs are very close in elo, then you might need several thousands games.
>>
>>There is no simple answer to your question. However, I know that there exist a
>>program called "whoisbetter" that can, given a match result, tell you if one
>>program can be considered better than his opponent.
>>
>>The very important thing to remember is that in order to know which of the top
>>PC chess programs is better, you will definitely need several thousands of
>>games, believe it or not. So it's always funny to see somebody giving an opinion
>>after 5 games.
>>
>>
>>Below is a table that can be used to get an idea of the number of games to play
>>to get a given error margin (in winning percentage and in elo difference) for a
>>given reliability (percentage of confidence).
>>
>>The tables say that, for example, if you want to know with 90% reliability which
>>opponent is better you will have to play 1000 games if their elo difference is
>>15 points. If their elo difference is below 10 points, you will have to play
>>more than 2000 games...
>>
>>Reliability of chess matches
>>
>>90% confidence
>>Games    %err+/-    elo+/-
>>    10     20        140pts
>>    20     15        105pts
>>    25     14         98pts
>>    30     12         63pts
>>    40     10         70pts
>>    50      9         56pts
>>   100      6.5       35pts
>>   200      4.72      33pts
>>   400      3.34      23pts
>>   600      2.66      19pts
>>   800      2.39      17pts
>>  1000      2.12      15pts
>>  1200      2.00      14pts
>>  1400      1.81      13pts
>>  1600      1.66      12pts
>>  2000     ~1.50      11pts
>>
>>80% confidence
>>Games    %err+/-    elo+/-
>>    10     15        105pts
>>    20     11         77pts
>>    25     10         70pts
>>    30      9         63pts
>>    40      8         56pts
>>    50      7         49pts
>>   100      5.0       35pts
>>   200      3.75      26pts
>>   400      2.60      18pts
>>   600      2.15      15pts
>>   800      1.86      13pts
>>  1000      1.66      12pts
>>  1200      1.46      10pts
>>  1400      1.40      10pts
>>  1600      1.34       9pts
>>
>>70% confidence
>>Games    %err+/-    elo+/-
>>    10     15         105pts
>>    20     10          70pts
>>    25      8          56pts
>>    30      8          56pts
>>    40      6.3        44pts
>>    50      6.0        42pts
>>   100      4.0        28pts
>>   200      3.0        21pts
>>   400      2.2        15pts
>>   600      1.7        12pts
>>   800      1.5        11pts
>>  1000      1.3         9pts
>>  1200      1.24        9pts
>>  1400      1.14        8pts
>>  1600      1.04        7pts
>>
>>
>>
>>    Christophe
>
>I always wondered how those tables are calculated. Since we have no model which
>includes draw scores and draw possibilities in any satisfactory way all those
>tables are just guessed (or most likely draw score possibilities are just
>ignored).
>
>If draws and their chances are ignored, divide games column number by 2 is best
>guess - each chess game has 3 outcomes, not 2, so every game equals to 2 coin
>tosses not one (roughly, draw percent depends on opponent strength and this is
>the problem here: we don't know what is expected percent of draw games).
>
>whoisbetter is one example of statistic ignoring one of 3 possible scores (it
>comes to extreme), and thus produces incorrect probabilities.
>
>-Andrew-



I'm not good enough at statistics to have produced these tables from a formula.

I have built these tables empirically: with a program producing random outcome
of chess games with the chances to win, draw or lose being equal. This is were
my logic is biased, the chances to win for white seem to be higher than just
1/3.

The tables have been produced by generating a very high number of simulated
matches and then crunching the numbers.

I expect my results to be close to theorical results. I have published these
tables several times and I have always asked for somebody to give me better
estimates. I'm still waiting.



    Christophe

Re: Shredder crushing Chess Tiger. Andrew Dados 12:26:11 12/15/03
- Re: Shredder crushing Chess Tiger. Andrew Dados 12:59:22 12/15/03
  - Reliability of chess matches Christophe Theron 16:00:24 12/15/03
- Re: Shredder crushing Chess Tiger. Dieter Buerssner 12:55:37 12/15/03

This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.