Author: Heiko Mikala
Date: 17:02:55 09/02/99
Go up one level in this thread
On September 02, 1999 at 18:58:24, Dave Gomboc wrote: >On September 02, 1999 at 18:20:22, Heiko Mikala wrote: > >>On September 02, 1999 at 15:52:17, Dave Gomboc wrote: >> >> >>>I don't think that you addressed my point, namely: >>> >>> Less games are required to conclude with a certain confidence >>> that one program is better than another when the results are >>> lopsided than when they are not. >> >>Oh yes, I think I did exactly address your point. >> >>Look at these two match-fragments: >> >> 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 >>A ½ 0 ½ 1 ½ 1 0 1 ½ 1 1 1 1 1 1 11.0/15 >>B ½ 1 ½ 0 ½ 0 1 0 ½ 0 0 0 0 0 0 4.0/15 >> >> >> 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 >>A 1 1 ½ 1 0 0 1 0 0 1 1 1 1 1 1 10.5/15 >>B 0 0 ½ 0 1 1 0 1 1 0 0 0 0 0 0 4.5/15 >> >>They both look very similar, don't they? So, if I understand you correctly, you >>would conclude from both matches, that engine A will most definitely be stronger >>than engine B, because the results are *very* "lopsided". > >You don't understand me correctly. > >I am saying that a match where program A scores 60% versus program B must be >much longer than a match where program A scores 80% versus program B before one >can conclude with the same confidence level (e.g. 0.95) that A is a stronger >player than B. > >I hope that was precise enough. And I say what you're saying is clearly wrong. Believe me, I learned this the hard way during the last ten years of work on my own chess program. I often had the case that in a first test match of about 30-40 games my program convincingly won a match, than let it play another, longer match overnight and during the next day, which it than lost. You always need the same amount of games, no matter how the score is after a first, short match. My experience after hundreds of test matches shows, that you need at least 70-80 games to be able to come to a conclusion. And you need some hundred games to be sure. Even if the first 15 games end in an 15-0 score. Because the next 15 games may end 0-15. This is a frustrating fact, but it is *a fact*. It's frustrating, because for us as programmers it means, that we have to do much more time consuming testing than we would like to do. Ask other programmers, how long their test matches are. You won't find many, who will tell you that their matches are only 20 or 30 games. Guess why. To say it in your words (to be precise ;-): I am saying that a match where program A scores 60% versus program B must be *as long as* a match where program A scores 80% versus program B before one can conclude with the same confidence level (e.g. 0.95) that A is a stronger player than B. Try it yourself. Play some long matches between programs. Don't stop after 20 or 40 games but play on. Or, if you don't believe me, look at the published match results from other people here in this forum for example. You will easily find examples, where one tester has an 80% score for program A, while another tester has an 80% score for program B. That's one of the reasons for many fights that have taken place here at the CCC! And because many people still don't believe, that they need hundreds of games to be absolutely sure, they accuse each other of cheating. Coming back to the examples of my last post. In both matches program A reached a 75% score (very close to your 80%). So would you conclude, that you could end these matches earlier than if program A only had scored 60%? I wouldn't. And even after 73 games in the second match, I still wasn't convinced, that program A was really stronger than B. Heiko.
This page took 0.01 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.