Author: Dann Corbit
Date: 14:42:59 01/30/01
Go up one level in this thread
On January 30, 2001 at 15:57:25, Bruce Moreland wrote: >On January 30, 2001 at 09:06:09, Jorge Pichard wrote: > >>Ever since I matched Nimzo 8 vs Junior 6 using my AMD K6-2 500 MHz and also >>matched them using my Athlon 800 MHz at G\60 and got different scores; some >>people argued that those games were not statistically significants to proof >>anything at all. Then we must disregard the SSDF rating list, since each Chess >>program only play 40 games against each other and not 200 games. >> >>PS: I am still convinced that Nimzo 8 is one of the few programs just like >>Gandalf 4.32 that benefit the most by using the best hardware available. And >>they are not programmed specifically to outperform Fritz 6 on a particular >>hardware such as the AMD K6-2 450 MHz. >> >>Pichard. > >When you play a match you get some information. You can see how the programs >played against each other, you know who won the match, and you know the score. > >It's perfectly valid to say that A beat B, if A won the match. That's a simple >fact. And you can look at the match and say that A played better than B. >That's more subjective, but perhaps you are expert enough that what you say is >true. > >You also have the score of the match. If A beats B in a three-game match by a >score of 2-1, you have some data that you can use to make a judgement about how >A compares with B. That A won the match is beyond question, but it is still an >open issue as to whether A is better than B in the completely true sense. > >You can assert that A is better than B, but you can also assert that B is better >than A. In this particular case, there is not a lot of difference between these >two assertions. If two programs are equal, and they are known to draw 30% of >the time, the odds that one would beat the other by a score of 2-1 are almost >40%. So it is more likely than not that A is better than B, assuming that A is >at least the tiniest bit better than B, but there's also a 40% chance that B is >at least a tiny bit better than A. > >As you play more games, it is possible that you can make your "A is better than >B" conclusion with a higher chance of accuracy, but this is not necessarily >true. > >If you play some games, and A completely wipes out B in terms of match score, >you can assert that A is better than B, with fairly good reliability, but if you >play a lot more games, and the score is close, your chance of accuracy may >actually be less. > >In order to make a good claim that A is better than B, A needs to beat B by such >a score that the odds of the score being due to chance are quite low. In >practice, this takes a lot of games, unless the match is a blowout. > >The closer two programs are to each other in terms of strength, the more games >will probably be necessary in order to prove with reasonable accuracy that one >is at least a little bit better than the other. > >There is no rule of thumb about how many games is enough, it depends completely >upon the score of the match. > >When you talk about the SSDF list, that's a different thing. The games are >played as a series of matches, but your score in the match doesn't determine >your position on the list, your total score against all opponents does. It >would probably actually be better if there were many more opponents and the >matches were shorter, in terms of figuring out who deserves the top spot on the >list, if what you are trying to do is measure general chess strength. > Additional measurements will not (in general) make the answer less accurate (unless something is wrong with the measurements). However, if two programs are about equal, you will [basically] never determine which is stronger by playing them against each other. For anyone who would like to prove this to themselves, just play a program against itself 10 times, 50 times, 100 times and 1000 times. The figure *should* [obviously] hover around 50% points scored for each side. It is very unlikely that the ten game match will be close to 50%. The 100 game match will probably be fairly close. It is rather unlikely that the 1000 game match will be far from 50%, but it is very unlikely it will be exactly 50%. In fact, if it should be exactly 50%, the Chi-Squared Test will reject it! It throws out both things that don't seem to fit the model and also things that fit so perfectly something looks fishy. ;-)
This page took 0.01 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.