Computer Chess Club Archives

Search

Terms
Messages

Subject: Re: Maybe a stupid experiment...

Author: Robert Hyatt
Date: 07:13:56 01/04/01
On January 04, 2001 at 02:38:42, Uri Blass wrote:

>On January 04, 2001 at 00:00:39, Robert Hyatt wrote:
>
>>On January 03, 2001 at 17:50:38, José Carlos wrote:
>>
>>>On January 03, 2001 at 16:26:19, Robert Hyatt wrote:
>>>
>>>>On January 03, 2001 at 09:52:06, José Carlos wrote:
>>>>
>>>>>  Lately, people have been talking here about significant results. I'm not
>>>>>really sure if probabilistic calculus is appropiate here, because chess games
>>>>>are not stocastic events.
>>>>>  So, I suggest an experiment to mesure the probabilistic noise:
>>>>>
>>>>>  -chose a random program and make it play itself.
>>>>>  -write down the result after 10 games, 50 games, 100 games...
>>>>>
>>>>>  It should tend to be an even result, and it would be possible to know how many
>>>>>games are needed to get a result with a certain degree of confidence.
>>>>>  If we try this for several programs, and the results are similar, we can draw
>>>>>a conclusion, in comparison with pure probabilistic calculus.
>>>>>
>>>>>  Does this idea make sense, or am I still sleeping? :)
>>>>>
>>>>>  José C.
>>>>
>>>>It is statistically invalid.  IE if you flip a coin 500 times do you _really_
>>>>expect to get 250 heads and 250 tails?  Probability distribution says you
>>>>won't get that very often at all.  In fact, if you flip long enough, you will
>>>>either get 500 straight heads or tails, or else prove the coin is _not_ actually
>>>>perfectly random with  50-50 probability of getting a head or tail.
>>>
>>>  But don't you think the more times you flip the coin, the closer the number of
>>>head and tails (in %) will be? Maybe the coin is not the better comparison, as
>>>it is a random event, and a chess game is not, but I still think it should work.
>>>But I expect a different rate of "closeness" (is this word correct?) for the
>>>same number of tries with the coin (random event) and the games (partially
>>>random -book, pondering, ... and partially not -eval function, search algos...),
>>>and that difference is what I want to measure.
>>>
>>>  José C.
>>
>>
>>No I don't.  Suppose that 500-0 run comes _first_.  How long will you have to
>>flip to get back to even?  You may _never_ get back to even.  Remember this is
>>a bell-curve shaped probability distribution.  Not a single spike on the curve
>>at the mid-point of the distribution.  You probably need to play 40 forty-game
>>matches to get the beginning of an idea of who is better.
>
>You replied to the sentence:
>"But don't you think the more times you flip the coin, the closer the number of
>head and tails (in %) will be?"
>
>I think that you missed the in %
>
>Uri


Actually, my answer doesn't particularly change for percentage or raw count.
If you flip a coin a million times, there is a very definite probability that
the result will end up 60%-40%.

The best way to handle this sort of question is via sampling.  Take the
overall population, and then extract 'samples' from that population.  You
_must_ do this with chess as the overall "population" is simply too large to
compute in reasonable time.  The deep fritz vs junior match in the SSDF is a
case in point.  Will fritz catch up?  I believe it is better.  But in _this_
match, it has a lot of work to do to even the score.  If they started the
match over, it might well win quickly.  But it is down several games and making
them up, if the two programs are close in strength, is going to be hard.

As someone pointed out, _if_ you are pretty sure about which program is
significantly stronger than the other, then a small sample might be enough to
prove that.  If you don't know, or if the programs are very close to each
other, then it might require thousands of games to conclude who is better.
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.