Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Tragedy

Author: Christophe Theron

Date: 19:30:35 11/04/00

Go up one level in this thread


On November 04, 2000 at 21:56:28, stuart taylor wrote:

>On November 04, 2000 at 21:14:38, Christophe Theron wrote:
>
>>On November 04, 2000 at 15:05:54, Uri Blass wrote:
>>
>>>On November 04, 2000 at 14:10:47, walter irvin wrote:
>>>
>>>>On November 04, 2000 at 13:43:44, Bruce Moreland wrote:
>>>>
>>>>>On November 04, 2000 at 12:03:29, Daniel Chancey wrote:
>>>>>
>>>>>>I was trying to find out how CMSilver fares against the best of the best.
>>>>>>Clearly it isn't doing well.
>>>>>>
>>>>>>Castle2000
>>>>>
>>>>>It might not be doing well, but it could have been an accident.  Your matches
>>>>>are short enough that if it had won like two more games in the "blowout" match
>>>>>you wouldn't be so sure.
>>>>>
>>>>>You have another blowout match going on now though, so it's looking a little
>>>>>more likely that the version isn't as good as the others in self-play.
>>>>>
>>>>>The way you are doing matches you can probably score three ways - draw, win,
>>>>>blowout.  If you start making decisions based upon this you can make a mistake
>>>>>if the matches are too short to prove that the score is real.  Even a long match
>>>>>can't prove that the score is real, if the score is close.
>>>>>
>>>>>It's possible to take the score of a match, and turn it into a statement such as
>>>>>"There is an 85% chance that version A is at least 20 Elo points better than
>>>>>version B."
>>>>>
>>>>>If that appeals to you, you may want to learn something about statistics.  I
>>>>>would tell you how to do it, but I don't know how.  If chess didn't have any
>>>>>draws it would be easier to do.
>>>>>
>>>>>bruce
>>>>thats easy  just dont count draws .play till some one wins a certain number of
>>>>games .then you can say well      A wins 75 games  B wins 25   ect
>>>
>>>It is not so simple because of some reasons:
>>>
>>>1)If you want to say that version A is at least 20 Elo better than version B
>>>then you have to count draws because 20-0 with no draws suggest that A is at
>>>least 20 elo better than B when 20-0 with 1000 draws suggest that A is not at
>>>least 20 Elo better than B
>>>
>>>2)The probability to win with white is not the same as the probability to win
>>>with black.
>>>
>>>3)Learning can change things and it is possible that version A is at least 20
>>>elo better than B after 10000 games but before playing it is worse than B.
>>>
>>>Uri
>>
>>
>>
>>All fine and to the point, but still playing a 10 games match to decide which
>>version is better is plain bullshit. Sorry, I had to say it...
>>
>>That's what Daniel should learn from statistics, even if we use rough
>>approximations.
>>
>>Daniel, you could check this by yourself. Try it, you will see that the result
>>is shoking. I have made the experiment myself, and it has changed my point of
>>view about chess matches (and I would even say it changed my point of view about
>>chess in general, and also about soccer, tennis and many other things).
>>
>>Here is what you should do: take the SAME program (or same PERSONALITY in your
>>case), and let them play a 10 games match against each other. The time controls
>>don't matter. Take blitz or 40/120, or anything you like.
>>
>>Write down the result after 10 games, or better: publish it here. We can all
>>learn from your experiment, so I think it is a good idea to publish it.
>>
>>Then run the match again. Without changing anything. Just the same match with
>>the same engines. And tell us what happens.
>>
>>You could think this is a stupid experiment. A program against itself should
>>always score 50%, so what are we going to learn from the experiment?
>>
>>Do it, report about it, and tell us what you can learn from it.
>>
>>
>>
>>    Christophe
>
>I wish a few people would have read my post from about a week ago, or more,
>about great idea for chess championship/tournament.
>  I think it would test many things in a most economical and accurate way, and
>give room for great insights into the programs.
>   I had very little response. Was everyone on vacation? And I see it WAS a
>subject that interests people.
>I could summarize it again.
>S.Taylor



I think I have read it, and it would be an improvement, but anyway the results
would still be highly unreliable.

The number of games to play to get a good idea of the relative rankings of
computers or humans of relatively close strengths is too large to be pratical
for tournaments anyway...


    Christophe



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.