Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Dummy Cadaques Tournament (Long)

Author: Christophe Theron

Date: 10:00:43 01/29/00

Go up one level in this thread


On January 29, 2000 at 08:54:47, Enrique Irazoqui wrote:

>On January 29, 2000 at 00:28:59, Christophe Theron wrote:
>
>>On January 28, 2000 at 16:10:53, Enrique Irazoqui wrote:
>>
>>>On January 28, 2000 at 14:40:35, Christophe Theron wrote:
>>>
>>>>On January 28, 2000 at 07:27:54, Enrique Irazoqui wrote:
>>>>
>>>>>There is a degree of uncertainty, but I don't think you need 1000 matches of 200
>>>>>games each to have an idea of who is best.
>>>>>
>>>>>Fischer became a chess legend for the games he played between his comeback in
>>>>>1970 to the Spassky match of 1972. In this period of time he played 157 games
>>>>>that proved to all of us without the hint of a doubt that he was the very best
>>>>>chess player of those times.
>>>>>
>>>>>Kasparov has been the undisputed best for many years. From 1984 until now, he
>>>>>played a total of 772 rated games. He needed less than half these games to
>>>>>convince everyone about who is the best chess player.
>>>>>
>>>>>This makes more sense to me than the probability stuff of your Qbasic program.
>>>>>Otherwise we would reach the absurd of believing that all the rankings in the
>>>>>history of chess are meaningless, and Capablanca, Fischer and Kasparov had long
>>>>>streaks of luck.
>>>>>
>>>>>You must have thought along these lines too when you proposed the matches
>>>>>Tiger-Diep and Tiger-Crafty as being meaningful, in spite of not being 200,000
>>>>>games long.
>>>>>
>>>>>Enrique
>>>>
>>>>
>>>>Enrique, I'm not sure you understand me.
>>>>
>>>>What my little QBasic program will tell you, if you try it, is that when the two
>>>>programs are very close in strength you need an incredible number of games in
>>>>order to determine which one is best.
>>>>
>>>>And when the elo difference between the programs is high enough, a small number
>>>>of games is enough.
>>>>
>>>>From my RNDMATCH program, I have derived the following table:
>>>>
>>>>Reliability of chess matches (this table is reliable with a 80% confidence)
>>>>
>>>> 10 games: 14.0% (105 pts)
>>>> 20 games: 11.0% ( 77 pts)
>>>> 30 games:  9.0% ( 63 pts)
>>>> 40 games:  8.0% ( 56 pts)
>>>> 50 games:  7.0% ( 49 pts)
>>>>100 games:  5.0% ( 35 pts)
>>>>200 games:  3.5% ( 25 pts)
>>>>400 games:  2.5% ( 18 pts)
>>>>600 games:  2.2% ( 15 pts)
>>>>
>>>>I hope others will have a critical look at my table and correct my maths if
>>>>needed.
>>>>
>>>>What this table tells you, is that with a 10 games match you can say that one
>>>>program is better ONLY if it gets a result above 64% (50+14.0). In this case you
>>>>can, with 80% chances to be right, say that this program is at least 105 elo
>>>>points better than its opponent.
>>>>
>>>>Note that you have still 20% chances to be wrong. But for pratical use I think
>>>>it's enough.
>>>
>>>I'm still not sure that I agree with you. If after 10 games the result is
>>>6.5-3.5, I wouldn't dare to say that the winner is better, not even with a
>>>probability of 80%.
>>
>>
>>If you make the experiment and run my QBasic program you will see that such a
>>result happens less than 20% of the time.
>>
>>20% might be high for you, and you might be right. Once in 5 matches you'll be
>>wrong...
>>
>>
>>
>>> In the next 10 games it could be the other way round and I
>>>don't think that a match between 2 opponents can decide which one is better
>>>anyway, unless the result is a real scandal. And this because of the relative
>>>lack of transitivity between chess programs. A can beat B 7-3, B can beat C 6-4
>>>and C can beat A 6-4, in which case A, B and C end up quite even in spite of the
>>>initial 7-3. We have seen things like these a number of times.
>>
>>
>>With such a low number of games, you can indeed not deduce anything about
>>transitivity.
>>
>>The only way to be sure about this non-transitivity phenomenon is to take 3
>>programs A, B and C.
>>
>>Play enough games to make sure that A is better than B (use the table above).
>>Then play B against C until you are sure which one is best.
>>
>>Finally, play A against C until you get a reliable result and check if
>>non-transitivity applies.
>
>From my own 1999 tournament, still with short and few matches:
>
>Junior 5-Genius 6        3.5-6.5
>Junior 5-Tiger 11.75     6.5-3.5
>Tiger 11.75-Genius 6       6-4
>
>Junior 5    = 10
>Genius 6    = 10.5
>Tiger 11.75 = 9.5
>
>I don't know precisely how often this happens, but I have seen non-transitivity
>a number of times, also in much longer matches.


So show me evidence with longer matches. Because the example you give has little
statistical relevance.




>In this example there are 2 cases of > 80% probability that ends up being wrong.
>With this I mean to say that I don't trust 80% probabilities for a penny.


I have never advocated for a 80% confidence.

It was just an example. The table I have given is OK for 80% confidence.

You can build such a table for 95% confidence, or for 99.9% confidence.

Still, my point is valid. With a small number of games you cannot tell which
program is stronger, unless there is a big difference between them.

For 95% confidence, this is even worse. You can deduce almost nothing from a 10
games match if you want to be 95% sure.




>I still don't think that a match can determine which one of 2 programs is the
>strongest, unless, of course, the end result is a real smash of the order of 90%
>or so after a long series of games.


I'm surprised that you think so.

A match that is long enough can tell which program is the best. The length of
the match that is needed is a function of the elo difference between the
programs.

With a big difference, a relatively small number of games will be enough.

If the difference is tiny, a very high number of games is needed.




>>Who has made the experiment already?
>>
>>Nobody.
>>
>>Who will make the experiment?
>>
>>Nobody. We love to believe in such things.
>>
>>I tend to believe myself in non-transitivity, however it has never been
>>demonstrated with a practical experiment...
>
>It would take too long to prove it, and you can always argue that transitivity
>has never been proven and we still love to believe in these things. But as far
>as believes go, I do believe that there is such a thing as non-transitivity. I
>have seen it, I suspect it exists...


I thought you were not somebody who likes to live without knowing.

I.e. you have established a test suite that predicts very accurately the SSDF
results.

By doing so you implicitely admit that you believe in statistics.

If you don't believe in statistics, you should stop making any reference to elo
ratings.

But if you buy the "statistics package", you buy several things with it. For
example the fact that a match can be relevant or not, that with a match that is
long enough you don't need a 90% winning percentage to be sure which program is
best, and so on...

When you start to study chess matches from a statistical point of view, you
discover some numbers that are surprising for common sense. It takes time to get
used to it, but if you want to understand things better, and I'm sure you want,
these statistical results are a great help.

I did not want to hurt your feelings, but sometimes a little bit of mathematical
help is welcome. You know how feelings can be misleading.

Further, I think our feelings can be better "tuned" if we train them on real
good examples (I mean significant experiments). Then they will help us better in
unclear experiments.

I mean if you don't know the rules of chess and you are watching a game, you can
have the feeling that one player is winning when he is not (because maybe you
are only watching the players attitudes). When you are an experimented player,
you will not be fooled by the players attitudes for example.



    Christophe



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.