Author: Dann Corbit
Date: 17:08:28 01/31/01
Go up one level in this thread
On January 31, 2001 at 19:29:55, Bruce Moreland wrote:
>On January 31, 2001 at 15:37:21, Dann Corbit wrote:
>
>>On January 31, 2001 at 14:17:48, Bruce Moreland wrote:
>>[snip]
>>>If you start a match and get 10-0 right away, it proves that p is bigger, by any
>>>reasonable standard of proof.
>>
>>And yet the SSDF has had matches start out that way which went to the other
>>opponent in the end (or something fairly close to that -- I forget the exact
>>figures for an O-fer reversal).
>>
>>With chess, the odds of 0/10 for evenly matched chess engines is harder to
>>figure, but with a coin toss it is easy:
>>
>>1/(2^10){all heads} + 1/(2^10){all tails} = 1/(2^9) = .2%
>>
>>Hence, if you had one thousand people flip ten pennies, (on average) two of them
>>would get either all heads or all tails. The question is -- are you one of
>>those people when you run an experiment?
>>
>>Improbable events do happen. That's why we buy fire insurance.
>>;-)
>
>You will have improbable cases occur. That is why they are called improbable
>and not impossible.
>
>I don't know why people have a hard time dealing with this issue. Every match
>result has associated with it a probability that the result could be achieved by
>two equal programs, purely by chance.
>
>If you get a result, and declare that the winner of the match is the better of
>the two programs, there is a chance that you will be wrong.
>
>If you play more games it is not guaranteed that this chance is reduced.
>
>What I can't understand is why people look at a 10-0 result and say, "That is
>due to chance!", and look at a 60-40 result and say, "One of the programs is
>better!", when the probability that the second result is due to chance is
>greater than the probability that the first result is due to chance.
>
>A bogus 40-60 result is more probable than a bogus 0-10 result, unless your test
>setup has a rock in it.
>
>This is a case where I am arguing with people who think that their common sense
>must be accurate, and that due to this, the world is flat. The world is not
>flat, and it is possible to prove it.
I agree with every word of this. However, I still think that more tests = more
reliable results. Or (at least) the programs play better/worse when the math
says that they play better/worse. I think one standard deviation is probably
acceptable for seat of the pants programming and two if you want to be really
sure.
In that file that I sent you, I saw all sorts of humorous things. Program X had
beautiful pawn chains and it's opponent had isolated pawns -- one every 3 files
floating like icky little islands. King safety completely ignored and both
blunders and brilliancies.
I bring this up because someone who looked at the games to decide which program
was smarter [especially within the first 20 games] would be especially shocked
when the full disclosure appeared.
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.