Computer Chess Club Archives


Search

Terms

Messages

Subject: Being better...

Author: Rolf Tueschen

Date: 07:12:00 01/23/04


We just had a little dispute about an old topic. When can we say that a prog is
better than another? How can we proceed to make sound arguments?

Let me tell the story in fast mode.

There was a test. I understand with 300 games or such. An incredibly high number
of games because often we have matches with onl 20 or 40 games.

I understood further that on the base of a confidence intervall of 1-58 we have
95%.

Now what I want to tell you, and this is undisputable statistical standard:

if you get a value that is in the intervall, we cannot conclude that the
difference of the two progs is relevant or valid or call it what you want. It
makes no sense to argue with such "low" differences. They could be still be on
the base of chance. Now the distribution of chance is the Bell curve. Nothing
else.

We had the debate with the SSDF list often enough.

Two progs stand at the top. One is number one in the ranking. But  is it really
stronger than prog number two???

The answer is easy. If the normal variation, this famous +- value in the SSDF
list is say +-40 points and the difference between progs is 35 points THEN we
are unable to conclude anything for sure. It could be that 1 is stronger than 2
but also the contrary could be true. Only from values >40 on we have
"certainty", statistically, that a prog in that specific design is proven
stronger than another one.

This is all so simply and trivial that it is satifying to be able to clarify.

Have fun,

Rolf

P.S.

I just want to correct a heavy mistake in a former posting. There it was said
for Elo differences that the difference of say 1 Elo point would be speaking for
a better strength of one prog over another and you needed so and so many gasmes
to prove that... - - this is total nonsense. There is _no_ way to conclude
anything out of an Elo difference of 1 point, no matter if you have 300 or
100000 games. The difference of 1 Elo point is meaningless. It's nonsense to
even think about such neccessary millions of games to "prove" that. Statistics
also has something to do with normal human sense. We would always take such a
difference for _equal_ strength.



This page took 0.01 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.