Computer Chess Club Archives




Subject: Re: A question about statistics...

Author: Rolf Tueschen

Date: 04:32:05 01/07/04

Go up one level in this thread

On January 07, 2004 at 02:07:08, Sandro Necchi wrote:


>Sorry, I was not precise enough, so to let everybody understand I will try to be
>more clear:
>the percentage of a program to be stronger than another in a single match score
>55 to 45 is about 20% and not 81%
>I do not care about statistic, but about real figure based on many tests and

This is a real problem. If you don't care about statistics but still want to say
something that IS statististics or based on it - then you have a real problem.
BTW in PART II I agree with you totally. You are 100% right. But here in PART I
you are very unexact again. You seperate statistics and REAL figures. And that
is almost a philosophical problem then. It's science. With logic involved.

Let's see what you are saying.

You say that a prog is 20% "stronger" when it just has beaten another prog with
55 to 45. Are you sure? With what method you can show how sure you are?

There you are directly in statistics again. In PART II you say the correct
answer. You have to play MANY games to get something for sure. Probably thousand
games. And here? You think that you can conclude that a prog is 20% stronger?

You suddenly use a completely different wording. Because we simply dont use that
expression "being stronger with so and so %". You misunderstand the statistical
expression being this and that with a percentage of so and so. The percentage of
significance (or of the certitude if you prefer) is something totally different
that your "20% stronger". Because again, I ask you to describe with what
certitude you can say that the one prog is 20% stronger than the other. And that
is what we were talking about and you misunderstood.

80% significance simply is too small for a good statistical result. Know what I
mean? And because we want to be 95% sure that our result is NOT by chance,
therefore we need the "thousand" games and more.

Look at this.

JUNIOR led against FRITZ 5-0 and still wasn't the better program. Statistically
I can say that 5 games is just to small a sample to be able to conclude
something on the whole population. Here a huge mass of games is the population.
FRITZ already equalized the score in the second 5 or 6 games. So in other words
your certitude statistically was down to 50% almost and this is total chance
with zero advantage in strength. All you know is that one game can be won bei A
the next by B. Or 5 in a row by A and the next 5 by B. This is all possible.
Only with thousands of games you could get higher certitude. Up to 95%. That is
the so-called statistical formula.

BTW if you are looking at the mere results of SSDF lists, you can see with your
eyes that the range of deviations that is possible to the left or the right
(minus or plus) is HIGHER than the difference in the naked Elo points of the
progs. Conclusion: with a couple of MORE games the list would look different.
The actual ranking is NOT sure or certain at all!!!!! (The SSDF is so honest as
to present this screw themselves, so they dont cheat at all! They do only avoid
to write that the actual ranking could also be the reverse in the first ranks.
But it follows from the REAL results of their own testing.)

This is what I am saying for years but nobody will listen because we dont have a
good alternative and personally I dont want to be involved in such nonsense
tests either.

>So if you are interested to know how correct the result is than you
>get 20%.

Objection. Again you misunderstand. By definition the number can't go below 50%
because at 50% you have total chance and no "correctness" at all. It could be so
or the other way round. Sandro, please, don't feel insulted again, this is
stats, and it's damned hard stuff to digest. Other people studied such things
for years and you won't get it in hours, not to speak of minutes. Sorry.

>The reason is that if you look at the games you quite probably will find
>variation which scored quite well (or quite bad), thus putting a big weight on
>the final score. This is why it is better to make the same test against other
>chess programs; at least against other 5.

You are absolutely right with your habit. You as the book author you must test
this way. But you are not primarily interested in the overall strength but the
advantages of certain variations. Advantages related to the actual version of
the engine. Of course this is decisive for the overall strength too!!

And good luck for your future engagements!


>There are only 2 ways to know if a program is better than another one:
>1. To make a huge amount of games against several opponents; at least 1000
>games. This everybody can do.
>2. To look at the games and analyze them. You need to be a strong player to do
>this and/or to know chess programs a lot as well.

But since computerchess still isn't on GM strength aslso less strong players
than GM can analyse computer games. :)



This page took 0.01 seconds to execute

Last modified: Thu, 07 Jul 11 08:48:38 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.