Author: Dave Gomboc
Date: 23:59:26 09/02/99
Go up one level in this thread
On September 02, 1999 at 20:02:55, Heiko Mikala wrote: >On September 02, 1999 at 18:58:24, Dave Gomboc wrote: > >>On September 02, 1999 at 18:20:22, Heiko Mikala wrote: >> >>>On September 02, 1999 at 15:52:17, Dave Gomboc wrote: >>> >>> >>>>I don't think that you addressed my point, namely: >>>> >>>> Less games are required to conclude with a certain confidence >>>> that one program is better than another when the results are >>>> lopsided than when they are not. >>> >>>Oh yes, I think I did exactly address your point. >>> >>>Look at these two match-fragments: >>> >>> 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 >>>A ½ 0 ½ 1 ½ 1 0 1 ½ 1 1 1 1 1 1 11.0/15 >>>B ½ 1 ½ 0 ½ 0 1 0 ½ 0 0 0 0 0 0 4.0/15 >>> >>> >>> 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 >>>A 1 1 ½ 1 0 0 1 0 0 1 1 1 1 1 1 10.5/15 >>>B 0 0 ½ 0 1 1 0 1 1 0 0 0 0 0 0 4.5/15 >>> >>>They both look very similar, don't they? So, if I understand you correctly, you >>>would conclude from both matches, that engine A will most definitely be stronger >>>than engine B, because the results are *very* "lopsided". >> >>You don't understand me correctly. >> >>I am saying that a match where program A scores 60% versus program B must be >>much longer than a match where program A scores 80% versus program B before one >>can conclude with the same confidence level (e.g. 0.95) that A is a stronger >>player than B. >> >>I hope that was precise enough. > >And I say what you're saying is clearly wrong. Believe me, I learned this the >hard way during the last ten years of work on my own chess program. I often had >the case that in a first test match of about 30-40 games my program convincingly >won a match, than let it play another, longer match overnight and during the >next day, which it than lost. You always need the same amount of games, no >matter how the score is after a first, short match. My experience after hundreds >of test matches shows, that you need at least 70-80 games to be able to come to >a conclusion. And you need some hundred games to be sure. Even if the first 15 >games end in an 15-0 score. Because the next 15 games may end 0-15. This is a >frustrating fact, but it is *a fact*. It's frustrating, because for us as >programmers it means, that we have to do much more time consuming testing than >we would like to do. > >Ask other programmers, how long their test matches are. You won't find many, who >will tell you that their matches are only 20 or 30 games. Guess why. > >To say it in your words (to be precise ;-): > > > I am saying that a match where program A scores 60% versus program B must be > *as long as* a match where program A scores 80% versus program B before one > can conclude with the same confidence level (e.g. 0.95) that A is a stronger > player than B. > > >Try it yourself. Play some long matches between programs. Don't stop after 20 or >40 games but play on. > >Or, if you don't believe me, look at the published match results from other >people here in this forum for example. You will easily find examples, where one >tester has an 80% score for program A, while another tester has an 80% score for >program B. That's one of the reasons for many fights that have taken place here >at the CCC! And because many people still don't believe, that they need hundreds >of games to be absolutely sure, they accuse each other of cheating. > >Coming back to the examples of my last post. In both matches program A reached a >75% score (very close to your 80%). So would you conclude, that you could end >these matches earlier than if program A only had scored 60%? I wouldn't. And >even after 73 games in the second match, I still wasn't convinced, that program >A was really stronger than B. > > >Heiko. Crack open an intro to statistics book. Dave
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.