Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Dummy Cadaques Tournament (Long)

Author: Enrique Irazoqui

Date: 13:10:53 01/28/00

Go up one level in this thread


On January 28, 2000 at 14:40:35, Christophe Theron wrote:

>On January 28, 2000 at 07:27:54, Enrique Irazoqui wrote:
>
>>There is a degree of uncertainty, but I don't think you need 1000 matches of 200
>>games each to have an idea of who is best.
>>
>>Fischer became a chess legend for the games he played between his comeback in
>>1970 to the Spassky match of 1972. In this period of time he played 157 games
>>that proved to all of us without the hint of a doubt that he was the very best
>>chess player of those times.
>>
>>Kasparov has been the undisputed best for many years. From 1984 until now, he
>>played a total of 772 rated games. He needed less than half these games to
>>convince everyone about who is the best chess player.
>>
>>This makes more sense to me than the probability stuff of your Qbasic program.
>>Otherwise we would reach the absurd of believing that all the rankings in the
>>history of chess are meaningless, and Capablanca, Fischer and Kasparov had long
>>streaks of luck.
>>
>>You must have thought along these lines too when you proposed the matches
>>Tiger-Diep and Tiger-Crafty as being meaningful, in spite of not being 200,000
>>games long.
>>
>>Enrique
>
>
>Enrique, I'm not sure you understand me.
>
>What my little QBasic program will tell you, if you try it, is that when the two
>programs are very close in strength you need an incredible number of games in
>order to determine which one is best.
>
>And when the elo difference between the programs is high enough, a small number
>of games is enough.
>
>From my RNDMATCH program, I have derived the following table:
>
>Reliability of chess matches (this table is reliable with a 80% confidence)
>
> 10 games: 14.0% (105 pts)
> 20 games: 11.0% ( 77 pts)
> 30 games:  9.0% ( 63 pts)
> 40 games:  8.0% ( 56 pts)
> 50 games:  7.0% ( 49 pts)
>100 games:  5.0% ( 35 pts)
>200 games:  3.5% ( 25 pts)
>400 games:  2.5% ( 18 pts)
>600 games:  2.2% ( 15 pts)
>
>I hope others will have a critical look at my table and correct my maths if
>needed.
>
>What this table tells you, is that with a 10 games match you can say that one
>program is better ONLY if it gets a result above 64% (50+14.0). In this case you
>can, with 80% chances to be right, say that this program is at least 105 elo
>points better than its opponent.
>
>Note that you have still 20% chances to be wrong. But for pratical use I think
>it's enough.

I'm still not sure that I agree with you. If after 10 games the result is
6.5-3.5, I wouldn't dare to say that the winner is better, not even with a
probability of 80%. In the next 10 games it could be the other way round and I
don't think that a match between 2 opponents can decide which one is better
anyway, unless the result is a real scandal. And this because of the relative
lack of transitivity between chess programs. A can beat B 7-3, B can beat C 6-4
and C can beat A 6-4, in which case A, B and C end up quite even in spite of the
initial 7-3. We have seen things like these a number of times.

I think that what you say may work as a general guideline, but I wouldn't feel
very safe using it.

Something else is that, intuitively, I don't find very relevant a difference
smaller than say 20 points, even if they play thousands of games.

But I understand your point.

Enrique

>I don't think this result sounds counter intuitive to most of us here.
>
>Now if you play 20 games you can detect, with a 80% confidence, if one program
>is 77 elo points better than its opponent. No revolution here I think.
>
>Play 40 games and you can, with 80% confidence, be sure that one program is 56
>elo points better.
>
>What's very important and, I think, overlooked by most testers, is that when the
>elo difference between two programs is tiny, the number of games to play becomes
>tremendous.
>
>For example, if the programs are separated by only 18 elo points, you need to
>play 400 GAMES! If you don't, you CANNOT DRAW ANY CONCLUSION.
>
>The right methodology when you do a match between two programs is this: you must
>play on until the winning percentage of one of the programs gets decisive.
>
>After  10 games, if no program wins by 64.0% or more => play on
>After  20 games, if no program wins by 61.0% or more => play on
>After  40 games, if no program wins by 58.0% or more => play on
>After 100 games, if no program wins by 55.0% or more => play on
>After 200 games, if no program wins by 53.5% or more => play on
>
>And so on.
>
>If you play two identical programs, you are likely to play on forever. That
>sounds strange, but it's only logical.
>
>And to answer your question, I thought that playing 40 games between Tiger and
>Diep and 40 games between Tiger and Crafty would be enough, because I think the
>difference between Tiger and these programs is above 56 elo points.
>
>
>
>    Christophe



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.