Author: Martin Schubert
Date: 00:13:56 05/09/01
Go up one level in this thread
On May 08, 2001 at 21:35:09, Dann Corbit wrote: >On May 08, 2001 at 14:05:39, Gian-Carlo Pascutto wrote: > >>On May 08, 2001 at 10:43:41, Martin Schubert wrote: >> >>>How should this be possible? First you need a zero hypothesis, e.g. Fritz is as >>>good as Junior. Okay, that's not the problem. But statistics is only possible >>>when results are independent. When you're using booklearning, they're not >>>independent. So you can't calculate a degree of confidence. >> >>I can try to offer 2 solutions, but I don't know if they are good enough >> >>a) assume booklearning has no influence on the zero hypothesis, i.e. >>that the learning of Junior and Fritz is equally good. This sounds >>reasonable, but may not be correct. >> >>b) assume the booklearning is part of the zero hypothesis, so that >>the strength of a program is also determined by its book learning >>abilities. >> >>If either of these fail, I would appreciate it if you could point >>out why. This is not my area, but I'd like to learn more. > >My approach would be to make assumptions and then test them. For instance: > >Fritz with book learning plays stronger than Fritz without book learning. > >Play 500 games against itself -- one engine with book learning and one without. > >You might be able to measure the ELO change as a function of time and make a >prediction based on the measurements. I'm quite sure that would take more than >500 games, though. > >Repeat the same experiement with Junior. > >Now, you could make a hypothesis that Junior with booklearning is >stronger/weaker/the-same-as Fritz with book learning. Run an experiment and >measure it. That doesn't help. The only thing you can do is play Fritz against Junior a very long match and examine if the probability of winning is the same in every part of the match. If every game is independent. And if not maybe you can say something about how strong the dependence is. By the way: if you do one match A-B (Fritz with booklearning against Fritz without booklearning) and one match C-D (same with Junior), you can never say anything about A-C. Martin > >If there is a bias, you should be able to measure that also. The big problem >with all of these experiments is that it will take an enormous amount of trials >to find the answer. > >Suppose one program is 500 ELO stronger than another program. You can detect >this with only a few dozen games fairly reliably. > >Suppose that it is only 50 ELO stronger ... Now it will take many thousands of >games to get a reliable answer. > >Suppose that it is only 5 ELO stronger ... now you will probably never find out. > Especially with things like book-learning etc. that alter the experiment as it >is running. > >In short, with two very strong programs of about the same strength, we don't >know which one is stronger and probably can't really even find out. If there is >book learning involved, it makes our predictions even less reliable. > >So what are we left with? >A. We could name a champion. >B. We could toss a coin. >C. We could let them play a series of games and choose the winner as champion. >D. We could do something completely different. > >Most of the time, we tend to opt for choice "C." -- even though it's not much >more reliable than the other methods. >;-) > > >of course, it may be that one is dominatingly stronger than the others. In >which case we will probably pick the stronger program with high certainty. Not >very many contests turn out like that. Certainly the Fritz/Junior affair did >not.
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.