Author: Rolf Tueschen
Date: 07:12:00 01/23/04
We just had a little dispute about an old topic. When can we say that a prog is better than another? How can we proceed to make sound arguments? Let me tell the story in fast mode. There was a test. I understand with 300 games or such. An incredibly high number of games because often we have matches with onl 20 or 40 games. I understood further that on the base of a confidence intervall of 1-58 we have 95%. Now what I want to tell you, and this is undisputable statistical standard: if you get a value that is in the intervall, we cannot conclude that the difference of the two progs is relevant or valid or call it what you want. It makes no sense to argue with such "low" differences. They could be still be on the base of chance. Now the distribution of chance is the Bell curve. Nothing else. We had the debate with the SSDF list often enough. Two progs stand at the top. One is number one in the ranking. But is it really stronger than prog number two??? The answer is easy. If the normal variation, this famous +- value in the SSDF list is say +-40 points and the difference between progs is 35 points THEN we are unable to conclude anything for sure. It could be that 1 is stronger than 2 but also the contrary could be true. Only from values >40 on we have "certainty", statistically, that a prog in that specific design is proven stronger than another one. This is all so simply and trivial that it is satifying to be able to clarify. Have fun, Rolf P.S. I just want to correct a heavy mistake in a former posting. There it was said for Elo differences that the difference of say 1 Elo point would be speaking for a better strength of one prog over another and you needed so and so many gasmes to prove that... - - this is total nonsense. There is _no_ way to conclude anything out of an Elo difference of 1 point, no matter if you have 300 or 100000 games. The difference of 1 Elo point is meaningless. It's nonsense to even think about such neccessary millions of games to "prove" that. Statistics also has something to do with normal human sense. We would always take such a difference for _equal_ strength.
This page took 0.01 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.