Computer Chess Club Archives


Search

Terms

Messages

Subject: How convincing is convincing?

Author: George Sobala

Date: 23:55:07 01/30/04

Go up one level in this thread


On January 30, 2004 at 18:50:23, Christophe Theron wrote:

>On January 30, 2004 at 15:00:00, Rex wrote:
>
>>I believe in his tourney YES S8 won convincingly and YES S8 can be call the
>>better program and strongest program in his tourney that he set up, in which
>>both sides had equal rules and time.
>>
>>I too am tired of people saying "not enough games."  Hey if you set up a tourney
>>and there is a winner, than that person has the right to post the winner as
>>the_best_proram and the winner for that particular tourney PERIOD with a dot.
>
>
>
>Saying that it is convincing will not fool the people who know what they are
>talking about.
>
>You can run a 20 games match again and this time, surprise surprise, you get the
>opposite result.
>
>You can call it a "convincing" result as well.
>
>As soon as this has happened to you, you realize that there is a problem. You
>can contradict yourself extremely easily.
>
>So you can either continue to call the results "convincing" and claim a
>different winner every day, just for fun, or think a little bit about it and
>realize that... you have not played enough games to convince anybody with a
>brain.
>
>I know the "not enough games" is going to get on people's nerves over time.
>However I think it's better to get on people's nerves then to stop fighting
>ignorance.
>

Depends on what one means by "convincing".

A +6 score in a 20-game human world championship chess final would be called
"convincing" by everyone. Imagine it: e.g. Kasparov v Leko. No-one would be
going around saying "this proves nothing, as far as I am concerned the two
contenders are equal. Let's make them play another 100 games."

A win of this magnitude means that the loser has a less than 1 in 5 chance of
actually being better, and in day-to-day life in non-critical situations, those
sort of odds are good enough for most people to make value judgements on. I
don't cancel my weekend outdoor trip because a forecast is made of an 18% chance
of rain.

The "95% significance level" beloved in this forum was not handed down onto the
mountain written in tablets of stone. It, too, is only a level of relative
truth. Now perhaps it is the level of truth YOU want to use when tinkering with
the settings of your engines. Fine. It is certainly the level of truth used in
much scientific literature. There has been critical questioning of that,
however. It is certainly not the level of probability I would want to use in a
really dangerous situation: e.g. the risk of being run over when trying to cross
a busy dual carriageway. I would be going for p<0.001 at the very least.

So, yesterday, in my fun match, done for my pleasure, I found a +6 score with a
p-value of <0.18, convincing. You, as the author of one of the engines, for whom
the result could mean more work to be done, did not. Relative truths.



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.