Author: Dann Corbit
Date: 13:10:10 07/30/02
Go up one level in this thread
On July 30, 2002 at 15:34:38, Steve Maughan wrote: >Daniel, > >>>What sort of comments? About statistical significance? >> >>Yup :) > >It's good that people here are aware that a few games are not *usually* enough >to establish the best program. However I fear that there is a danger of >computer chess nihilism where any series of games are poo-pood as statistically >insignificant. Note that if the score had been 37 v 23 one could say with 95% >accuracy that one program was better than another (no doubt 95% is not good >enough for some people!). Also note that a score of say 35 v 25 shows that one >program is *probably* better than the other. The point of the series of games >was to quickly establish if Tao or Pepito had improved significantly - the >conclusion being that they were still *about* the same strength. I think that conclusion is not warranted. The are about the same strength in relation to each other. I suspect that their strength has changed. It [accepting 95% confidence] also means that with 5% of the trials you will choose a wrong conclusion. IOW, 37 heads and 25 tails is odd but not astonishing. If you want to be really sure that something is an improvement, the question is "At what level of risk am I willing to accept something as proven?" Obviously, the lower the risk, the more sound the decision. Of course, there is a trade-off. If you wait until you are 99.99999999% certain, you will never make any choice. Obviously, that is not a good strategy either. So it calls for balance.
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.