Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Computer Chess Nihilism

Author: Dann Corbit

Date: 13:10:10 07/30/02

Go up one level in this thread


On July 30, 2002 at 15:34:38, Steve Maughan wrote:

>Daniel,
>
>>>What sort of comments?  About statistical significance?
>>
>>Yup :)
>
>It's good that people here are aware that a few games are not *usually* enough
>to establish the best program.  However I fear that there is a danger of
>computer chess nihilism where any series of games are poo-pood as statistically
>insignificant.  Note that if the score had been 37 v 23 one could say with 95%
>accuracy that one program was better than another (no doubt 95% is not good
>enough for some people!).  Also note that a score of say 35 v 25 shows that one
>program is *probably* better than the other.  The point of the series of games
>was to quickly establish if Tao or Pepito had improved significantly - the
>conclusion being that they were still *about* the same strength.

I think that conclusion is not warranted.  The are about the same strength in
relation to each other.  I suspect that their strength has changed.

It [accepting 95% confidence] also means that with 5% of the trials you will
choose a wrong conclusion.

IOW, 37 heads and 25 tails is odd but not astonishing.  If you want to be really
sure that something is an improvement, the question is "At what level of risk am
I willing to accept something as proven?"

Obviously, the lower the risk, the more sound the decision.  Of course, there is
a trade-off.  If you wait until you are 99.99999999% certain, you will never
make any choice.  Obviously, that is not a good strategy either.  So it calls
for balance.



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.