Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Dangers in CC - SSDF: Terminology, Statistics

Author: Richard Pijl

Date: 10:29:47 02/22/03

Go up one level in this thread


On February 22, 2003 at 07:31:33, Rolf Tueschen wrote:

>On February 21, 2003 at 19:59:53, Richard Pijl wrote:
>
>>
>>>
>>>Let's take the debate to a higher level since you have a differentiated view.
>>>
>>>Ok, let's take them for amateurs (not scientists). But still what would you
>>>think about a "responsibility" of persons, who call themselves neutral,
>>>and who present a number one, although this month the two top programs were
>>>seperated by 8 points (with SD of >30 points), and who are well aware of the
>>>meaning of SD? Would you say, ok, they are amateurs, the don't understand the
>>>danger or would you go farther and state that the result ["number one"] is ok
>>>with the 8 points advantage?
>>
>>I think the SSDF is very well aware of the commercial impact their list has.
>>That is probably the reason they print the remark on the error margin above the
>>list.
>>When talking about the error margin (which is not the same as SD): That is quite
>>arbitrary as well. The error margin itself is related to a reliability factor,
>>which is typically 95% (I don't know the actual number used by SSDF in this
>>though). If you want to have a list with higher reliability, the reported error
>>margins are higher. So what might be interesting is to calculate what the
>>reliability is of an 8 point difference in this list.
>>I don't see anything wrong with calling Shredder the number 1. The next list may
>>feature another engine at the number 1 spot, giving it a spotlight for a month
>>too. And if it still is Shredder? Perhaps it was the best engine then ... ;-)
>>
>>>
>>>A second aspect is the Elo base as such for new progs. What do you think about
>>>the Elosystem in view of progs who become 2700 players i a day or two? Should we
>>>rethink the Elosystem for comps?
>>
>>We can always rethink the ranking system. But I don't know of a better system
>>for this purpose than the Elosystem...
>>
>>Richard.
>

I don't like the attitude you show in your response below. Please keep to the
point without looking for alternate motives I may have in 'defending' SSDF. I
have none. My program is not listed there, I have no commercial interests in
chess, I'm not a member of the SSDF and I'm not a Swede.

>Perhaps our English is too bad so that you can't understand what I asked you,
>but this here is no answer. I asked a very clear question and you answered with
>stating something else. I repeat: What is the justification for the presentation
>of a number one with 8 point advantage if the "margin", I say SD,

I suggest you review the introductory course on statistics again ...
If you require a reliability of just x%, 8 points may be outside the error
margin, if you require a 10x higher reliability, the margins are roughtly twice
as big. So the question about justification of presenting results where error
margins are overlapping is not a really relevant discussion.

> is so high,
>that the advatage is in the iside of the "margin". Just give a concentrated
>answer on this question.

I don't see anything wrong with the SSDF position on this. So calling Shredder
number 1 in SSDF is ok with me. It still doesn't mean Shredder is the best
engine, but at least it is one of the strongest today.

>If you reply another time with the trivial statement
>that the SSDF is well aware that this is a problem and that therefore they give
>the remark on the "margin", then this debate will continue without me. Because
>then I knew that you were just another with the agenda to spread confusion with
>the goal to defend SSDF agaist absolutely justified critic. As I said this is
>about science. Well knowing that the SSDF is not science. But still one can
>debate the questions in a fair manner. That's all.

If you are looking for a way of proving that one engine is better than another,
you're in for nasty surprise: That's just not possible. But methods like the
SSDF is using at least try to estimate the strength of engines by comparing them
to eachother. As far as I'm concerned you can compare these efforts to weather
forcasts. They are just as reliable. Sometimes very accurate, but more often
they are not.

>
>I don't want to hear arguments that the next time still another prog could be
>number one. You make me laugh. Because that sounds as if you wanted to give me
>consolation because I felt sadness because "my" favorite prog did not win the
>number one. But I am not involved in all that. What is my only interest? To know
>if the number one for Shredder is correct. And IMO, on science reasons, it is
>NOT!

The number one position in SSDF only indicates that Shredder is a strong engine,
one of the strongest around. Other engines are about as strong as Shredder. That
is another conclusion you can draw from this list.

>I asked what you thought about the instant becoming 2700 in a couple of days. I
>don't have your answer. What I have is that you say "there is no better system
>than the Elosystem that I know of". But then we should ask further questions.

I think I did not understand the question here then. I only responded on the
second part of the question: 'Should we rethink the Elosystem for comps?'

>And that is again the strangeness in your answer. You seem to be not much
>interested /motivated in a real search for the truth. Your interest is mainly to
>persuade me to leave my questions.

You can ask what you like, I will answer what I like. I'm not looking for a
better method then ELO, so that is the reason for the brevity of my answer.
If you propose one, I'm willing to discuss it though ...

>So, that is agains the reason for my
>statement that this is now the last chance to make a serious debate with you.
>There is no problem if you don't want such a debate. But then why you gave the
>impression to be interested? Oh well.
>
>It's difficult to explain why scientifical questions are so normal and without
>agenda. But even here in CCC I can discover these interests, that already
>destroyed rgcc. Surely not those science interests.

I will pretend not to have read this part.
Richard.



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.