Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Dangers in CC - SSDF: Terminology, Statistics

Author: Tony Hedlund

Date: 08:43:41 02/23/03

Go up one level in this thread


On February 23, 2003 at 01:33:25, Richard Pijl wrote:

>On February 22, 2003 at 14:03:38, Rolf Tueschen wrote:
>
>>The final question. As I said I am out of the rest. Also, I don't give the
>>questions because you MUST answer them. Not at all. You can do what you want.
>>
>>Here is the crucial question, that is not "picked" but logically crucial stuff.
>>
>>- the 8 pts topic
>>
>>You are claiming that if these 8 pts fall IN the same range of the error margin
>>for both progs THEN it'a still allowed to present the two concerned programs on
>>seperate ranks AND to call the higher prog the new number one? Are you certain?
>>If yes, could you please give a short hint for the stats references? Thanks.
>
>Yes, you can still do that. However, because the error margins are given as
>well, it also gives a hint about the reliability of the statement.
>
>One place you can see reliability in action is in a little tool by Steve
>Maughan. It is called 'Who is better'. You can find it at:
>
>http://www.stevemaughan.com/chess.htm
>
>In this tool you can set the requested reliability of the statement which engine
>is better and it shows you given the amount of points scored by one engine, how
>many games should have been played.
>
>The error margins in SSDF are related to a reliability figure, which I don't
>know (and yes, they should specify the use reliability number to give the list
>more meaning :-) )but is typically 95%. Increasing the error margin to 99% would
>give wider error margins, decreasing to 90% will give you smaller error margins.

95% is used. Thanks for the links.

Tony

>If you are using a lower reliability percentage, you can find one in where the 8
>point difference is larger than the error margins.
>
>On the following page you can find some more on this:
>
>http://www.sportsci.org/resource/stats/contents.html
>
>Note that SD is the standard deviation in individual game results, and playing a
>match is measuring the engine's relative strength with multiple trials.
>
>Richard.



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.