Author: Enrique Irazoqui
Date: 13:10:53 01/28/00
Go up one level in this thread
On January 28, 2000 at 14:40:35, Christophe Theron wrote: >On January 28, 2000 at 07:27:54, Enrique Irazoqui wrote: > >>There is a degree of uncertainty, but I don't think you need 1000 matches of 200 >>games each to have an idea of who is best. >> >>Fischer became a chess legend for the games he played between his comeback in >>1970 to the Spassky match of 1972. In this period of time he played 157 games >>that proved to all of us without the hint of a doubt that he was the very best >>chess player of those times. >> >>Kasparov has been the undisputed best for many years. From 1984 until now, he >>played a total of 772 rated games. He needed less than half these games to >>convince everyone about who is the best chess player. >> >>This makes more sense to me than the probability stuff of your Qbasic program. >>Otherwise we would reach the absurd of believing that all the rankings in the >>history of chess are meaningless, and Capablanca, Fischer and Kasparov had long >>streaks of luck. >> >>You must have thought along these lines too when you proposed the matches >>Tiger-Diep and Tiger-Crafty as being meaningful, in spite of not being 200,000 >>games long. >> >>Enrique > > >Enrique, I'm not sure you understand me. > >What my little QBasic program will tell you, if you try it, is that when the two >programs are very close in strength you need an incredible number of games in >order to determine which one is best. > >And when the elo difference between the programs is high enough, a small number >of games is enough. > >From my RNDMATCH program, I have derived the following table: > >Reliability of chess matches (this table is reliable with a 80% confidence) > > 10 games: 14.0% (105 pts) > 20 games: 11.0% ( 77 pts) > 30 games: 9.0% ( 63 pts) > 40 games: 8.0% ( 56 pts) > 50 games: 7.0% ( 49 pts) >100 games: 5.0% ( 35 pts) >200 games: 3.5% ( 25 pts) >400 games: 2.5% ( 18 pts) >600 games: 2.2% ( 15 pts) > >I hope others will have a critical look at my table and correct my maths if >needed. > >What this table tells you, is that with a 10 games match you can say that one >program is better ONLY if it gets a result above 64% (50+14.0). In this case you >can, with 80% chances to be right, say that this program is at least 105 elo >points better than its opponent. > >Note that you have still 20% chances to be wrong. But for pratical use I think >it's enough. I'm still not sure that I agree with you. If after 10 games the result is 6.5-3.5, I wouldn't dare to say that the winner is better, not even with a probability of 80%. In the next 10 games it could be the other way round and I don't think that a match between 2 opponents can decide which one is better anyway, unless the result is a real scandal. And this because of the relative lack of transitivity between chess programs. A can beat B 7-3, B can beat C 6-4 and C can beat A 6-4, in which case A, B and C end up quite even in spite of the initial 7-3. We have seen things like these a number of times. I think that what you say may work as a general guideline, but I wouldn't feel very safe using it. Something else is that, intuitively, I don't find very relevant a difference smaller than say 20 points, even if they play thousands of games. But I understand your point. Enrique >I don't think this result sounds counter intuitive to most of us here. > >Now if you play 20 games you can detect, with a 80% confidence, if one program >is 77 elo points better than its opponent. No revolution here I think. > >Play 40 games and you can, with 80% confidence, be sure that one program is 56 >elo points better. > >What's very important and, I think, overlooked by most testers, is that when the >elo difference between two programs is tiny, the number of games to play becomes >tremendous. > >For example, if the programs are separated by only 18 elo points, you need to >play 400 GAMES! If you don't, you CANNOT DRAW ANY CONCLUSION. > >The right methodology when you do a match between two programs is this: you must >play on until the winning percentage of one of the programs gets decisive. > >After 10 games, if no program wins by 64.0% or more => play on >After 20 games, if no program wins by 61.0% or more => play on >After 40 games, if no program wins by 58.0% or more => play on >After 100 games, if no program wins by 55.0% or more => play on >After 200 games, if no program wins by 53.5% or more => play on > >And so on. > >If you play two identical programs, you are likely to play on forever. That >sounds strange, but it's only logical. > >And to answer your question, I thought that playing 40 games between Tiger and >Diep and 40 games between Tiger and Crafty would be enough, because I think the >difference between Tiger and these programs is above 56 elo points. > > > > Christophe
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.