Computer Chess Club Archives

Search

Terms
Messages

Subject: Re: Who is better? Some statistics...

Author: Peter Fendrich
Date: 01:21:14 06/13/01
On June 13, 2001 at 03:17:01, Martin Schubert wrote:

>On June 12, 2001 at 19:09:26, Dann Corbit wrote:
>
>>On June 12, 2001 at 10:26:57, Martin Schubert wrote:
>>
>>>On June 12, 2001 at 07:54:34, Peter Fendrich wrote:
>>>
>>>>On June 12, 2001 at 07:17:06, Martin Schubert wrote:
>>>>
>>>>>On June 12, 2001 at 06:08:03, Peter Fendrich wrote:
>>>>>
>>>>>>On June 11, 2001 at 17:46:13, Martin Schubert wrote:
>>>>>>
>>>>>>>On June 11, 2001 at 13:55:31, Gian-Carlo Pascutto wrote:
>>>>>>>
>>>>>>>>On June 11, 2001 at 13:36:21, Leen Ammeraal wrote:
>>>>>>>>
>>>>>>>>>Although Peter's program can in many ways be better
>>>>>>>>>than mine, I don't see how it can be more accurate,
>>>>>>>>>that is, as long as we regard, for example,
>>>>>>>>>10-5-0 as equivalent to 8-3-4. As you see, I simply
>>>>>>>>>divide the number of draws by 2 and add the result
>>>>>>>>>to either side.
>>>>>>>>
>>>>>>>>It is more accurate simply because it does not have
>>>>>>>>to do that simplification at all!
>>>>>>>>
>>>>>>>>10 - 5 -  0 -> 89,4% chance that A is better
>>>>>>>>8  - 3 -  4 -> 92,7% chance
>>>>>>>
>>>>>>>Why do you get different probabilities for the same score?
>>>>>>
>>>>>>It is really different probabilities.
>>>>>
>>>>>Depends on the assumptions. What do you assume? I would assume all three
>>>>>probabilites as 1/3.
>>>>>But usually you make a test like: if A reaches more than x points, say that A is
>>>>>better than B. If A doesn't reach more than x points, you can't draw any
>>>>>conclusion. So the same score should lead to the same results.
>>>>>In statistics you have an "area" (don't know the english word) of possible
>>>>>results where you say the hypothesis isn't true when a result in this "area"
>>>>>happens. And usually this "area" has a form like "points>x". You don't have to
>>>>>do this in this form, but how is your area?
>>>>>Do you understand what I want to say (sorry for my english)?
>>>>>
>>>>>Regards, Martin
>>>>
>>>>I think we are talking about different things here. What I am trying to say is
>>>>that the two scores above will get the same probability with a binomial
>>>>distribution but not with the trinomial one. p=1/3 or not doesn't matter. It
>>>>will generate other "A better than B" probabilities but the number of draws will
>>>>still give the two game scores different reliability.
>>>>
>>>>Your Hypothesis "area" with the trinomial distribution isn't 2-dimensional as in
>>>>the binomial case but 3-dimeansional. Read my text about this.
>>>>I'll be glad to send it to you. Just tell me!
>>>
>>>Okay, maybe we're talking about different things.
>>>I thought we were talking about different probabilities for different results
>>>(10-5-0,8-3-4). So were is a binomial distribution? The distribution doesn't
>>>change because of the result.
>>>Of course the result 10-5-0 has a different probability then 8-3-4. But when we
>>>discuss about "A stronger then B", this probability doesn't matter.
>>>Okay, maybe it's a good idea that you send me your text, and after that we can
>>>continue discussing.
>>
>>I would like a copy too.
>>
>>I think maybe the biggest problem with this whole experimental model is the
>>model itself.
>>
>>1.  White wins more than black.
>>2.  With increasing strength, does the ratio of draws increase for opponents of
>>approximately equal strength?  We see this with people.
>>3.  When programs learn, the trials are not independent.  How can we alter the
>>model to take this into consideration?
>
>Number 3 is the problem. I think it's nearly impossible to use this in a
>statistic model.
>
>Martin

1) It is possible to feed into the program if you have some figures. Maybe the
W/L/D-stat from all Dann's collected games can be used.
2) Like in 1) but divided into rating groups.
3) The model assumes independent observations which is of course not true with
learning capabilities in the programs. This will undoubtedly affect and bias the
result. With the learning approaches used in programs today, I am not too
worried however, if both programs have about the same level of learning. This is
statistically even better than programs playing the same stupid variant over and
over again. Some random behaviour is necessary.

I think you are both beyond the limit of what is possible to cover with a few
games between only two players. All possible sources of errors and biases will
override all this fine tuning...
There is a lot of information from the games and the match conditions that is
not covered by statistical methods but that must be used in situations like
this. Did something peculiar take place in the games? Bugs in the programs?
Other circumstances that should be takning into account?
In my own experience I only use the stat results as an indication of something I
have to take a closer look at.

For instance: I have some ideas of changes in my program. I implement them each
and with combinations. I have now say 10 different programs to test. I run some
testsets, I do matches against my previous program version. I do matches against
some other program, I look at the games and so on. The statistical outcome of
the matches is just one piece of informationa that has to be backed up by other
results.
There is a better way: play millions of games against all possible opponents.
but...

//Peter
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.