Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Dummy Cadaques Tournament (Long)

Author: Christophe Theron

Date: 21:28:59 01/28/00

Go up one level in this thread


On January 28, 2000 at 16:10:53, Enrique Irazoqui wrote:

>On January 28, 2000 at 14:40:35, Christophe Theron wrote:
>
>>On January 28, 2000 at 07:27:54, Enrique Irazoqui wrote:
>>
>>>There is a degree of uncertainty, but I don't think you need 1000 matches of 200
>>>games each to have an idea of who is best.
>>>
>>>Fischer became a chess legend for the games he played between his comeback in
>>>1970 to the Spassky match of 1972. In this period of time he played 157 games
>>>that proved to all of us without the hint of a doubt that he was the very best
>>>chess player of those times.
>>>
>>>Kasparov has been the undisputed best for many years. From 1984 until now, he
>>>played a total of 772 rated games. He needed less than half these games to
>>>convince everyone about who is the best chess player.
>>>
>>>This makes more sense to me than the probability stuff of your Qbasic program.
>>>Otherwise we would reach the absurd of believing that all the rankings in the
>>>history of chess are meaningless, and Capablanca, Fischer and Kasparov had long
>>>streaks of luck.
>>>
>>>You must have thought along these lines too when you proposed the matches
>>>Tiger-Diep and Tiger-Crafty as being meaningful, in spite of not being 200,000
>>>games long.
>>>
>>>Enrique
>>
>>
>>Enrique, I'm not sure you understand me.
>>
>>What my little QBasic program will tell you, if you try it, is that when the two
>>programs are very close in strength you need an incredible number of games in
>>order to determine which one is best.
>>
>>And when the elo difference between the programs is high enough, a small number
>>of games is enough.
>>
>>From my RNDMATCH program, I have derived the following table:
>>
>>Reliability of chess matches (this table is reliable with a 80% confidence)
>>
>> 10 games: 14.0% (105 pts)
>> 20 games: 11.0% ( 77 pts)
>> 30 games:  9.0% ( 63 pts)
>> 40 games:  8.0% ( 56 pts)
>> 50 games:  7.0% ( 49 pts)
>>100 games:  5.0% ( 35 pts)
>>200 games:  3.5% ( 25 pts)
>>400 games:  2.5% ( 18 pts)
>>600 games:  2.2% ( 15 pts)
>>
>>I hope others will have a critical look at my table and correct my maths if
>>needed.
>>
>>What this table tells you, is that with a 10 games match you can say that one
>>program is better ONLY if it gets a result above 64% (50+14.0). In this case you
>>can, with 80% chances to be right, say that this program is at least 105 elo
>>points better than its opponent.
>>
>>Note that you have still 20% chances to be wrong. But for pratical use I think
>>it's enough.
>
>I'm still not sure that I agree with you. If after 10 games the result is
>6.5-3.5, I wouldn't dare to say that the winner is better, not even with a
>probability of 80%.


If you make the experiment and run my QBasic program you will see that such a
result happens less than 20% of the time.

20% might be high for you, and you might be right. Once in 5 matches you'll be
wrong...



> In the next 10 games it could be the other way round and I
>don't think that a match between 2 opponents can decide which one is better
>anyway, unless the result is a real scandal. And this because of the relative
>lack of transitivity between chess programs. A can beat B 7-3, B can beat C 6-4
>and C can beat A 6-4, in which case A, B and C end up quite even in spite of the
>initial 7-3. We have seen things like these a number of times.


With such a low number of games, you can indeed not deduce anything about
transitivity.

The only way to be sure about this non-transitivity phenomenon is to take 3
programs A, B and C.

Play enough games to make sure that A is better than B (use the table above).
Then play B against C until you are sure which one is best.

Finally, play A against C until you get a reliable result and check if
non-transitivity applies.

Who has made the experiment already?

Nobody.

Who will make the experiment?

Nobody. We love to believe in such things.

I tend to believe myself in non-transitivity, however it has never been
demonstrated with a practical experiment...



>I think that what you say may work as a general guideline, but I wouldn't feel
>very safe using it.


Why? My numbers don't come out from an obscure statistical theory. I got them by
simulating chess matches with a program I have published.

What's wrong? Why would chess programs behave differently?



>Something else is that, intuitively, I don't find very relevant a difference
>smaller than say 20 points, even if they play thousands of games.


A difference of 20 elo points needs 400 games to be noticed. So it's natural
that intuition does not work for such small differences. However they exist.



    Christophe



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.