Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Sorry Rolf - the winner is the winner.

Author: Rolf Tueschen

Date: 10:42:13 09/13/02

Go up one level in this thread


On September 13, 2002 at 13:17:57, Uri Blass wrote:

>On September 13, 2002 at 13:04:50, Rolf Tueschen wrote:
>
>>On September 13, 2002 at 12:20:36, Uri Blass wrote:
>>
>>>On September 13, 2002 at 11:25:44, David Dory wrote:
>>>
>>>>On September 13, 2002 at 09:20:26, Rolf Tueschen wrote:
>>>>
>>>><snip>
>>>>>
>>>>>Let's quickly compare human lists and computer rankings. The Elo method allows
>>>>>to calculate the individual strength (performance) over the variable of age. In
>>>>>CC programs have no age at all, because almost each new version gets completely
>>>>>new limbs and organs so to speak. That means that you can't compare the old and
>>>>>the new version. Or would you compare the embryo with M. Dos Savant?  We
>>>>>remember the old saying "You can't compare apples with beans". Nevertheless CC
>>>>>has ranking lists for decades now with the astonishing result that the newest
>>>>>progs are on top and the oldest, on the weakest hardware, are at the bottom. >Big surprise!
>>>>===================
>>>>I agree with you 100%, Rolf on this issue: testing software on vastly unequal
>>>>hardware is totally a waste of time and an insult to the reader's intelligence,
>>>>really.
>>>
>>>I disagree
>>>
>>>It is not a waste of time to test programs with unequal hardware.
>>>Not always the better hardware wins and you can learn from the results.
>>>
>>>palm tiger has a 50% against kallisto inspite of the fact that kallisto has 486
>>>and palm has significantly slower hardware.
>>>
>>>I think that it may be interesting to see also other programs on slow hardware
>>>and not only tiger14.9 but the ssdf has not unlimited time.
>>>
>>>I think that it is interesting to see how much rating programs earn from the new
>>>hardware and without testing programs on old hardware there is no way to know.
>>>
>>>You also need games against different opponents in order to generate rating list
>>>so games with unequal hardware are needed.
>>>
>>>Uri
>>
>>
>>This is not meant as aggressive, Uri, but excuse me, I must say that your final
>>sentence disqualifies you as a tester. You cannot proceed this way. Testing and
>>statics is not a question of input here and there to get safe results. The bias
>>alone from such intensiously implemented things invalidates your whole activity
>>as a tester. This might be difficult to understand for laymen but it's still the
>>truth.
>
>I do not understand what is the problem here.
>
>I think that the best thing to do is to give every 2 opponents to play the same
>number of games(unfortunately the ssdf cannot do it).
>
>The only problem that can make the rating misleading in that case is killer
>books and learning to repeat wins but hardware is not relevant for this problem.
>
>Uri

I see that you have (?) little experience with statistics. The point is that you
should define all design _in advance_. Only then the results have a real
meaning. You simply can't take a few ancient progs if necessary and at will and
then "complete" your data. This is regarded as a gross miscarriage.

The point is your argument that you need such matches to be able to calculate
your results!

Rolf Tueschen



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.