Author: Rolf Tueschen
Date: 10:42:13 09/13/02
Go up one level in this thread
On September 13, 2002 at 13:17:57, Uri Blass wrote: >On September 13, 2002 at 13:04:50, Rolf Tueschen wrote: > >>On September 13, 2002 at 12:20:36, Uri Blass wrote: >> >>>On September 13, 2002 at 11:25:44, David Dory wrote: >>> >>>>On September 13, 2002 at 09:20:26, Rolf Tueschen wrote: >>>> >>>><snip> >>>>> >>>>>Let's quickly compare human lists and computer rankings. The Elo method allows >>>>>to calculate the individual strength (performance) over the variable of age. In >>>>>CC programs have no age at all, because almost each new version gets completely >>>>>new limbs and organs so to speak. That means that you can't compare the old and >>>>>the new version. Or would you compare the embryo with M. Dos Savant? We >>>>>remember the old saying "You can't compare apples with beans". Nevertheless CC >>>>>has ranking lists for decades now with the astonishing result that the newest >>>>>progs are on top and the oldest, on the weakest hardware, are at the bottom. >Big surprise! >>>>=================== >>>>I agree with you 100%, Rolf on this issue: testing software on vastly unequal >>>>hardware is totally a waste of time and an insult to the reader's intelligence, >>>>really. >>> >>>I disagree >>> >>>It is not a waste of time to test programs with unequal hardware. >>>Not always the better hardware wins and you can learn from the results. >>> >>>palm tiger has a 50% against kallisto inspite of the fact that kallisto has 486 >>>and palm has significantly slower hardware. >>> >>>I think that it may be interesting to see also other programs on slow hardware >>>and not only tiger14.9 but the ssdf has not unlimited time. >>> >>>I think that it is interesting to see how much rating programs earn from the new >>>hardware and without testing programs on old hardware there is no way to know. >>> >>>You also need games against different opponents in order to generate rating list >>>so games with unequal hardware are needed. >>> >>>Uri >> >> >>This is not meant as aggressive, Uri, but excuse me, I must say that your final >>sentence disqualifies you as a tester. You cannot proceed this way. Testing and >>statics is not a question of input here and there to get safe results. The bias >>alone from such intensiously implemented things invalidates your whole activity >>as a tester. This might be difficult to understand for laymen but it's still the >>truth. > >I do not understand what is the problem here. > >I think that the best thing to do is to give every 2 opponents to play the same >number of games(unfortunately the ssdf cannot do it). > >The only problem that can make the rating misleading in that case is killer >books and learning to repeat wins but hardware is not relevant for this problem. > >Uri I see that you have (?) little experience with statistics. The point is that you should define all design _in advance_. Only then the results have a real meaning. You simply can't take a few ancient progs if necessary and at will and then "complete" your data. This is regarded as a gross miscarriage. The point is your argument that you need such matches to be able to calculate your results! Rolf Tueschen
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.