Author: Christophe Theron
Date: 18:14:38 11/04/00
Go up one level in this thread
On November 04, 2000 at 15:05:54, Uri Blass wrote:
>On November 04, 2000 at 14:10:47, walter irvin wrote:
>
>>On November 04, 2000 at 13:43:44, Bruce Moreland wrote:
>>
>>>On November 04, 2000 at 12:03:29, Daniel Chancey wrote:
>>>
>>>>I was trying to find out how CMSilver fares against the best of the best.
>>>>Clearly it isn't doing well.
>>>>
>>>>Castle2000
>>>
>>>It might not be doing well, but it could have been an accident. Your matches
>>>are short enough that if it had won like two more games in the "blowout" match
>>>you wouldn't be so sure.
>>>
>>>You have another blowout match going on now though, so it's looking a little
>>>more likely that the version isn't as good as the others in self-play.
>>>
>>>The way you are doing matches you can probably score three ways - draw, win,
>>>blowout. If you start making decisions based upon this you can make a mistake
>>>if the matches are too short to prove that the score is real. Even a long match
>>>can't prove that the score is real, if the score is close.
>>>
>>>It's possible to take the score of a match, and turn it into a statement such as
>>>"There is an 85% chance that version A is at least 20 Elo points better than
>>>version B."
>>>
>>>If that appeals to you, you may want to learn something about statistics. I
>>>would tell you how to do it, but I don't know how. If chess didn't have any
>>>draws it would be easier to do.
>>>
>>>bruce
>>thats easy just dont count draws .play till some one wins a certain number of
>>games .then you can say well A wins 75 games B wins 25 ect
>
>It is not so simple because of some reasons:
>
>1)If you want to say that version A is at least 20 Elo better than version B
>then you have to count draws because 20-0 with no draws suggest that A is at
>least 20 elo better than B when 20-0 with 1000 draws suggest that A is not at
>least 20 Elo better than B
>
>2)The probability to win with white is not the same as the probability to win
>with black.
>
>3)Learning can change things and it is possible that version A is at least 20
>elo better than B after 10000 games but before playing it is worse than B.
>
>Uri
All fine and to the point, but still playing a 10 games match to decide which
version is better is plain bullshit. Sorry, I had to say it...
That's what Daniel should learn from statistics, even if we use rough
approximations.
Daniel, you could check this by yourself. Try it, you will see that the result
is shoking. I have made the experiment myself, and it has changed my point of
view about chess matches (and I would even say it changed my point of view about
chess in general, and also about soccer, tennis and many other things).
Here is what you should do: take the SAME program (or same PERSONALITY in your
case), and let them play a 10 games match against each other. The time controls
don't matter. Take blitz or 40/120, or anything you like.
Write down the result after 10 games, or better: publish it here. We can all
learn from your experiment, so I think it is a good idea to publish it.
Then run the match again. Without changing anything. Just the same match with
the same engines. And tell us what happens.
You could think this is a stupid experiment. A program against itself should
always score 50%, so what are we going to learn from the experiment?
Do it, report about it, and tell us what you can learn from it.
Christophe
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.