Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Dangers in CC - SSDF: Terminology, Statistics

Author: Richard Pijl

Date: 22:33:25 02/22/03

Go up one level in this thread


On February 22, 2003 at 14:03:38, Rolf Tueschen wrote:

>The final question. As I said I am out of the rest. Also, I don't give the
>questions because you MUST answer them. Not at all. You can do what you want.
>
>Here is the crucial question, that is not "picked" but logically crucial stuff.
>
>- the 8 pts topic
>
>You are claiming that if these 8 pts fall IN the same range of the error margin
>for both progs THEN it'a still allowed to present the two concerned programs on
>seperate ranks AND to call the higher prog the new number one? Are you certain?
>If yes, could you please give a short hint for the stats references? Thanks.

Yes, you can still do that. However, because the error margins are given as
well, it also gives a hint about the reliability of the statement.

One place you can see reliability in action is in a little tool by Steve
Maughan. It is called 'Who is better'. You can find it at:

http://www.stevemaughan.com/chess.htm

In this tool you can set the requested reliability of the statement which engine
is better and it shows you given the amount of points scored by one engine, how
many games should have been played.

The error margins in SSDF are related to a reliability figure, which I don't
know (and yes, they should specify the use reliability number to give the list
more meaning :-) )but is typically 95%. Increasing the error margin to 99% would
give wider error margins, decreasing to 90% will give you smaller error margins.

If you are using a lower reliability percentage, you can find one in where the 8
point difference is larger than the error margins.

On the following page you can find some more on this:

http://www.sportsci.org/resource/stats/contents.html

Note that SD is the standard deviation in individual game results, and playing a
match is measuring the engine's relative strength with multiple trials.

Richard.



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.