Author: Christophe Theron
Date: 21:28:59 01/28/00
Go up one level in this thread
On January 28, 2000 at 16:10:53, Enrique Irazoqui wrote:
>On January 28, 2000 at 14:40:35, Christophe Theron wrote:
>
>>On January 28, 2000 at 07:27:54, Enrique Irazoqui wrote:
>>
>>>There is a degree of uncertainty, but I don't think you need 1000 matches of 200
>>>games each to have an idea of who is best.
>>>
>>>Fischer became a chess legend for the games he played between his comeback in
>>>1970 to the Spassky match of 1972. In this period of time he played 157 games
>>>that proved to all of us without the hint of a doubt that he was the very best
>>>chess player of those times.
>>>
>>>Kasparov has been the undisputed best for many years. From 1984 until now, he
>>>played a total of 772 rated games. He needed less than half these games to
>>>convince everyone about who is the best chess player.
>>>
>>>This makes more sense to me than the probability stuff of your Qbasic program.
>>>Otherwise we would reach the absurd of believing that all the rankings in the
>>>history of chess are meaningless, and Capablanca, Fischer and Kasparov had long
>>>streaks of luck.
>>>
>>>You must have thought along these lines too when you proposed the matches
>>>Tiger-Diep and Tiger-Crafty as being meaningful, in spite of not being 200,000
>>>games long.
>>>
>>>Enrique
>>
>>
>>Enrique, I'm not sure you understand me.
>>
>>What my little QBasic program will tell you, if you try it, is that when the two
>>programs are very close in strength you need an incredible number of games in
>>order to determine which one is best.
>>
>>And when the elo difference between the programs is high enough, a small number
>>of games is enough.
>>
>>From my RNDMATCH program, I have derived the following table:
>>
>>Reliability of chess matches (this table is reliable with a 80% confidence)
>>
>> 10 games: 14.0% (105 pts)
>> 20 games: 11.0% ( 77 pts)
>> 30 games: 9.0% ( 63 pts)
>> 40 games: 8.0% ( 56 pts)
>> 50 games: 7.0% ( 49 pts)
>>100 games: 5.0% ( 35 pts)
>>200 games: 3.5% ( 25 pts)
>>400 games: 2.5% ( 18 pts)
>>600 games: 2.2% ( 15 pts)
>>
>>I hope others will have a critical look at my table and correct my maths if
>>needed.
>>
>>What this table tells you, is that with a 10 games match you can say that one
>>program is better ONLY if it gets a result above 64% (50+14.0). In this case you
>>can, with 80% chances to be right, say that this program is at least 105 elo
>>points better than its opponent.
>>
>>Note that you have still 20% chances to be wrong. But for pratical use I think
>>it's enough.
>
>I'm still not sure that I agree with you. If after 10 games the result is
>6.5-3.5, I wouldn't dare to say that the winner is better, not even with a
>probability of 80%.
If you make the experiment and run my QBasic program you will see that such a
result happens less than 20% of the time.
20% might be high for you, and you might be right. Once in 5 matches you'll be
wrong...
> In the next 10 games it could be the other way round and I
>don't think that a match between 2 opponents can decide which one is better
>anyway, unless the result is a real scandal. And this because of the relative
>lack of transitivity between chess programs. A can beat B 7-3, B can beat C 6-4
>and C can beat A 6-4, in which case A, B and C end up quite even in spite of the
>initial 7-3. We have seen things like these a number of times.
With such a low number of games, you can indeed not deduce anything about
transitivity.
The only way to be sure about this non-transitivity phenomenon is to take 3
programs A, B and C.
Play enough games to make sure that A is better than B (use the table above).
Then play B against C until you are sure which one is best.
Finally, play A against C until you get a reliable result and check if
non-transitivity applies.
Who has made the experiment already?
Nobody.
Who will make the experiment?
Nobody. We love to believe in such things.
I tend to believe myself in non-transitivity, however it has never been
demonstrated with a practical experiment...
>I think that what you say may work as a general guideline, but I wouldn't feel
>very safe using it.
Why? My numbers don't come out from an obscure statistical theory. I got them by
simulating chess matches with a program I have published.
What's wrong? Why would chess programs behave differently?
>Something else is that, intuitively, I don't find very relevant a difference
>smaller than say 20 points, even if they play thousands of games.
A difference of 20 elo points needs 400 games to be noticed. So it's natural
that intuition does not work for such small differences. However they exist.
Christophe
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.