Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: If 75 Games are not considered a Statistical proof, neither is the SSDF.

Author: Dann Corbit

Date: 17:08:28 01/31/01

Go up one level in this thread


On January 31, 2001 at 19:29:55, Bruce Moreland wrote:

>On January 31, 2001 at 15:37:21, Dann Corbit wrote:
>
>>On January 31, 2001 at 14:17:48, Bruce Moreland wrote:
>>[snip]
>>>If you start a match and get 10-0 right away, it proves that p is bigger, by any
>>>reasonable standard of proof.
>>
>>And yet the SSDF has had matches start out that way which went to the other
>>opponent in the end (or something fairly close to that -- I forget the exact
>>figures for an O-fer reversal).
>>
>>With chess, the odds of 0/10 for evenly matched chess engines is harder to
>>figure, but with a coin toss it is easy:
>>
>>1/(2^10){all heads} + 1/(2^10){all tails} = 1/(2^9) = .2%
>>
>>Hence, if you had one thousand people flip ten pennies, (on average) two of them
>>would get either all heads or all tails.  The question is -- are you one of
>>those people when you run an experiment?
>>
>>Improbable events do happen.  That's why we buy fire insurance.
>>;-)
>
>You will have improbable cases occur.  That is why they are called improbable
>and not impossible.
>
>I don't know why people have a hard time dealing with this issue.  Every match
>result has associated with it a probability that the result could be achieved by
>two equal programs, purely by chance.
>
>If you get a result, and declare that the winner of the match is the better of
>the two programs, there is a chance that you will be wrong.
>
>If you play more games it is not guaranteed that this chance is reduced.
>
>What I can't understand is why people look at a 10-0 result and say, "That is
>due to chance!", and look at a 60-40 result and say, "One of the programs is
>better!", when the probability that the second result is due to chance is
>greater than the probability that the first result is due to chance.
>
>A bogus 40-60 result is more probable than a bogus 0-10 result, unless your test
>setup has a rock in it.
>
>This is a case where I am arguing with people who think that their common sense
>must be accurate, and that due to this, the world is flat.  The world is not
>flat, and it is possible to prove it.

I agree with every word of this.  However, I still think that more tests = more
reliable results.  Or (at least) the programs play better/worse when the math
says that they play better/worse.  I think one standard deviation is probably
acceptable for seat of the pants programming and two if you want to be really
sure.

In that file that I sent you, I saw all sorts of humorous things.  Program X had
beautiful pawn chains and it's opponent had isolated pawns -- one every 3 files
floating like icky little islands.  King safety completely ignored and both
blunders and brilliancies.

I bring this up because someone who looked at the games to decide which program
was smarter [especially within the first 20 games] would be especially shocked
when the full disclosure appeared.



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.