Author: Enrique Irazoqui
Date: 05:54:47 01/29/00
Go up one level in this thread
On January 29, 2000 at 00:28:59, Christophe Theron wrote: >On January 28, 2000 at 16:10:53, Enrique Irazoqui wrote: > >>On January 28, 2000 at 14:40:35, Christophe Theron wrote: >> >>>On January 28, 2000 at 07:27:54, Enrique Irazoqui wrote: >>> >>>>There is a degree of uncertainty, but I don't think you need 1000 matches of 200 >>>>games each to have an idea of who is best. >>>> >>>>Fischer became a chess legend for the games he played between his comeback in >>>>1970 to the Spassky match of 1972. In this period of time he played 157 games >>>>that proved to all of us without the hint of a doubt that he was the very best >>>>chess player of those times. >>>> >>>>Kasparov has been the undisputed best for many years. From 1984 until now, he >>>>played a total of 772 rated games. He needed less than half these games to >>>>convince everyone about who is the best chess player. >>>> >>>>This makes more sense to me than the probability stuff of your Qbasic program. >>>>Otherwise we would reach the absurd of believing that all the rankings in the >>>>history of chess are meaningless, and Capablanca, Fischer and Kasparov had long >>>>streaks of luck. >>>> >>>>You must have thought along these lines too when you proposed the matches >>>>Tiger-Diep and Tiger-Crafty as being meaningful, in spite of not being 200,000 >>>>games long. >>>> >>>>Enrique >>> >>> >>>Enrique, I'm not sure you understand me. >>> >>>What my little QBasic program will tell you, if you try it, is that when the two >>>programs are very close in strength you need an incredible number of games in >>>order to determine which one is best. >>> >>>And when the elo difference between the programs is high enough, a small number >>>of games is enough. >>> >>>From my RNDMATCH program, I have derived the following table: >>> >>>Reliability of chess matches (this table is reliable with a 80% confidence) >>> >>> 10 games: 14.0% (105 pts) >>> 20 games: 11.0% ( 77 pts) >>> 30 games: 9.0% ( 63 pts) >>> 40 games: 8.0% ( 56 pts) >>> 50 games: 7.0% ( 49 pts) >>>100 games: 5.0% ( 35 pts) >>>200 games: 3.5% ( 25 pts) >>>400 games: 2.5% ( 18 pts) >>>600 games: 2.2% ( 15 pts) >>> >>>I hope others will have a critical look at my table and correct my maths if >>>needed. >>> >>>What this table tells you, is that with a 10 games match you can say that one >>>program is better ONLY if it gets a result above 64% (50+14.0). In this case you >>>can, with 80% chances to be right, say that this program is at least 105 elo >>>points better than its opponent. >>> >>>Note that you have still 20% chances to be wrong. But for pratical use I think >>>it's enough. >> >>I'm still not sure that I agree with you. If after 10 games the result is >>6.5-3.5, I wouldn't dare to say that the winner is better, not even with a >>probability of 80%. > > >If you make the experiment and run my QBasic program you will see that such a >result happens less than 20% of the time. > >20% might be high for you, and you might be right. Once in 5 matches you'll be >wrong... > > > >> In the next 10 games it could be the other way round and I >>don't think that a match between 2 opponents can decide which one is better >>anyway, unless the result is a real scandal. And this because of the relative >>lack of transitivity between chess programs. A can beat B 7-3, B can beat C 6-4 >>and C can beat A 6-4, in which case A, B and C end up quite even in spite of the >>initial 7-3. We have seen things like these a number of times. > > >With such a low number of games, you can indeed not deduce anything about >transitivity. > >The only way to be sure about this non-transitivity phenomenon is to take 3 >programs A, B and C. > >Play enough games to make sure that A is better than B (use the table above). >Then play B against C until you are sure which one is best. > >Finally, play A against C until you get a reliable result and check if >non-transitivity applies. From my own 1999 tournament, still with short and few matches: Junior 5-Genius 6 3.5-6.5 Junior 5-Tiger 11.75 6.5-3.5 Tiger 11.75-Genius 6 6-4 Junior 5 = 10 Genius 6 = 10.5 Tiger 11.75 = 9.5 I don't know precisely how often this happens, but I have seen non-transitivity a number of times, also in much longer matches. In this example there are 2 cases of > 80% probability that ends up being wrong. With this I mean to say that I don't trust 80% probabilities for a penny. I still don't think that a match can determine which one of 2 programs is the strongest, unless, of course, the end result is a real smash of the order of 90% or so after a long series of games. >Who has made the experiment already? > >Nobody. > >Who will make the experiment? > >Nobody. We love to believe in such things. > >I tend to believe myself in non-transitivity, however it has never been >demonstrated with a practical experiment... It would take too long to prove it, and you can always argue that transitivity has never been proven and we still love to believe in these things. But as far as believes go, I do believe that there is such a thing as non-transitivity. I have seen it, I suspect it exists... Enrique >>I think that what you say may work as a general guideline, but I wouldn't feel >>very safe using it. > > >Why? My numbers don't come out from an obscure statistical theory. I got them by >simulating chess matches with a program I have published. > >What's wrong? Why would chess programs behave differently? > > > >>Something else is that, intuitively, I don't find very relevant a difference >>smaller than say 20 points, even if they play thousands of games. > > >A difference of 20 elo points needs 400 games to be noticed. So it's natural >that intuition does not work for such small differences. However they exist. > > > > Christophe
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.