Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: How convincing is convincing?

Author: Vasik Rajlich

Date: 03:04:32 01/31/04

Go up one level in this thread


On January 31, 2004 at 02:55:07, George Sobala wrote:

>On January 30, 2004 at 18:50:23, Christophe Theron wrote:
>
>>On January 30, 2004 at 15:00:00, Rex wrote:
>>
>>>I believe in his tourney YES S8 won convincingly and YES S8 can be call the
>>>better program and strongest program in his tourney that he set up, in which
>>>both sides had equal rules and time.
>>>
>>>I too am tired of people saying "not enough games."  Hey if you set up a tourney
>>>and there is a winner, than that person has the right to post the winner as
>>>the_best_proram and the winner for that particular tourney PERIOD with a dot.
>>
>>
>>
>>Saying that it is convincing will not fool the people who know what they are
>>talking about.
>>
>>You can run a 20 games match again and this time, surprise surprise, you get the
>>opposite result.
>>
>>You can call it a "convincing" result as well.
>>
>>As soon as this has happened to you, you realize that there is a problem. You
>>can contradict yourself extremely easily.
>>
>>So you can either continue to call the results "convincing" and claim a
>>different winner every day, just for fun, or think a little bit about it and
>>realize that... you have not played enough games to convince anybody with a
>>brain.
>>
>>I know the "not enough games" is going to get on people's nerves over time.
>>However I think it's better to get on people's nerves then to stop fighting
>>ignorance.
>>
>
>Depends on what one means by "convincing".
>
>A +6 score in a 20-game human world championship chess final would be called
>"convincing" by everyone. Imagine it: e.g. Kasparov v Leko. No-one would be
>going around saying "this proves nothing, as far as I am concerned the two
>contenders are equal. Let's make them play another 100 games."
>
>A win of this magnitude means that the loser has a less than 1 in 5 chance of
>actually being better, and in day-to-day life in non-critical situations, those
>sort of odds are good enough for most people to make value judgements on. I
>don't cancel my weekend outdoor trip because a forecast is made of an 18% chance
>of rain.
>
>The "95% significance level" beloved in this forum was not handed down onto the
>mountain written in tablets of stone. It, too, is only a level of relative
>truth. Now perhaps it is the level of truth YOU want to use when tinkering with
>the settings of your engines. Fine. It is certainly the level of truth used in
>much scientific literature. There has been critical questioning of that,
>however. It is certainly not the level of probability I would want to use in a
>really dangerous situation: e.g. the risk of being run over when trying to cross
>a busy dual carriageway. I would be going for p<0.001 at the very least.
>
>So, yesterday, in my fun match, done for my pleasure, I found a +6 score with a
>p-value of <0.18, convincing. You, as the author of one of the engines, for whom
>the result could mean more work to be done, did not. Relative truths.

Actually the problem with everybody posting their results isn't statistical
significance. As you pointed out, there would be no problem accepting a
certified twenty-game match for the world championship as absolutely valid. The
problem is that everybody can run these programs on their machines and basically
generate any result they see fit - usually it's enough to keep generating
results until one of them is seen fit. If Christophe likes I am sure I could
"show" with 95% certainty that Rybka is better than Tiger :-) That's why there
are tournaments.

Vas



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.