Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Comments of latest SSDF list 2

Author: Rolf Tueschen

Date: 08:15:16 05/26/02

Go up one level in this thread


On May 26, 2002 at 10:42:34, Torstein Hall wrote:

>On May 26, 2002 at 07:49:26, Rolf Tueschen wrote:
>
>>On May 26, 2002 at 05:09:33, Torstein Hall wrote:
>>
>>>>Another point: if you took a look at the list where Shredder was leading you
>>>>could see that the leading programs had played their games against totally
>>>>different opponents. So you can't compare the ratings at all.
>>>
>>>If you can not do that then I think you can forget about rating. I'm playing
>>>different players based on rating and of course often we have not played the
>>>same persons. That is one of the reasons we have rating!
>>
>>This is absurd. I assist Martin Schubert that _testing_ could not allow
>>deliberately chosen opponents. We are talking about rankings in test series,
>>_not_ in real life tournaments.
>>
>>
>>>
>>>>My suggestion: the top programms should play the same opponents to make it
>>>>possible to compare their results.
>>>>If I remember right it happens quite often that a program is very strong in the
>>>>first rating list it appears in (where it plays against weak opponents). In the
>>>>next rating list where it has to fight the tough ones it falls back in the
>>>>rating list.
>>>
>>>That is what the error margins are for. I think the rating normally stays within
>>>this limits. So for a given program that has got a SSDF rating of say 2600 +/-
>>>43 You can say with 95% (if I remember right) confidence that the program has a
>>>rating within the range 2557 - 2643
>>
>>This is absolutely false. THe error margins have _nothing_ in principal to do
>>with different opponents (on different hardware actually)! The margins are
>>simply a consequence of the statistical maths.
>>
>>Rolf Tueschen
>Who are you arguing with?
>
>The absurd thins is that I never has sayed what you say is absurd!!!!

Excuse me. I wrote "Absurd" for your paragraphe and I will repeat my opinion
with arguments here again.

> I was just
>reading what the numbers meen! And that is we can tell a rating with 95%
>confidence inside this margins!

The problem is IMO that we cannot read this out of the numbers. Therefore my
expression. That is BTW the reason why I wrote my critic. Of course I am not
holding _you_ responsible for the SSDF technology. Only for the false
conclusions. I can also explain. Since the numbers resulted from totally
different hardware and opponents to a great extent, the mean almost nothing in
the end. That is the sad truth. The outer appearance with 5 % and stuff like
that is not the proof for serious numbers. Their seriousness can't be defined in
the final stats but only before the testing started.


>
>But another thing Martin did say was that we can not use the numbers when we
>have played different players. I disaggree strongly to that, as long as we are
>talking about the same pool of players. If it was not for that, rating numbers
>would be utterly useless. (And maybee they are....... :-D )
>
>Torstein

This is also true in a way. :) But I think you are confusing tournament
performance rankings with testings!

Behind that, not necessarily in your case, is a basic misunderstanding about
mathematical technics. People say but the calculation commands have been defined
for whatever, here tournaments in chess (here by Prof. Elo), and the fconclude
that it's simple to transport all that to the testing. Of course you _always_
get results, no matter what you do. But to present a ranking after test series
you should be aware of simple conditions which are required to be able to do the
stats. In a tournament we don't ask about the weight or the age of the players.
Elo is just numbers about the tournament performance. But if you want to sell
machines or programs questions about the hardware begin to have a special
importance. This is all very simple. Now, if you once decided to do such work
you should be aware of several statistical basics. This is all I am saying.
Because I saw that SSDF simply doesn't do all that what is easy to do and what
is necessary.

However I see the debate for what it is, it's a debate about statistics. I am
not interested in bashing or insulting people. Neither in SSDF nor here in the
debate. Of course it's acceptable to use a more direct language than just
writing "Sigh" which is in my eyes rather arrogant! If I write "Absurd!", I mean
a conclusion or an opinion, I do not mean the person behind. Please do not take
offense.

Rolf Tueschen

>
>
>
>>
>>
>>>
>>>Torstein
>>>>
>>>>Regards, Martin



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.