Author: Rolf Tueschen
Date: 03:05:12 02/23/03
Go up one level in this thread
On February 23, 2003 at 01:33:25, Richard Pijl wrote: >On February 22, 2003 at 14:03:38, Rolf Tueschen wrote: > >>The final question. As I said I am out of the rest. Also, I don't give the >>questions because you MUST answer them. Not at all. You can do what you want. >> >>Here is the crucial question, that is not "picked" but logically crucial stuff. >> >>- the 8 pts topic >> >>You are claiming that if these 8 pts fall IN the same range of the error margin >>for both progs THEN it'a still allowed to present the two concerned programs on >>seperate ranks AND to call the higher prog the new number one? Are you certain? >>If yes, could you please give a short hint for the stats references? Thanks. > >Yes, you can still do that. However, because the error margins are given as >well, it also gives a hint about the reliability of the statement. What you say means: "yes, you can well do nonsense but please have in mind that it's nonsense." Don't expect further reactions from my side. Rolf Tueschen > >One place you can see reliability in action is in a little tool by Steve >Maughan. It is called 'Who is better'. You can find it at: > >http://www.stevemaughan.com/chess.htm > >In this tool you can set the requested reliability of the statement which engine >is better and it shows you given the amount of points scored by one engine, how >many games should have been played. > >The error margins in SSDF are related to a reliability figure, which I don't >know (and yes, they should specify the use reliability number to give the list >more meaning :-) )but is typically 95%. Increasing the error margin to 99% would >give wider error margins, decreasing to 90% will give you smaller error margins. > >If you are using a lower reliability percentage, you can find one in where the 8 >point difference is larger than the error margins. > >On the following page you can find some more on this: > >http://www.sportsci.org/resource/stats/contents.html > >Note that SD is the standard deviation in individual game results, and playing a >match is measuring the engine's relative strength with multiple trials. > >Richard.
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.