Author: Uri Blass
Date: 10:52:52 09/13/02
Go up one level in this thread
On September 13, 2002 at 13:42:13, Rolf Tueschen wrote: >On September 13, 2002 at 13:17:57, Uri Blass wrote: > >>On September 13, 2002 at 13:04:50, Rolf Tueschen wrote: >> >>>On September 13, 2002 at 12:20:36, Uri Blass wrote: >>> >>>>On September 13, 2002 at 11:25:44, David Dory wrote: >>>> >>>>>On September 13, 2002 at 09:20:26, Rolf Tueschen wrote: >>>>> >>>>><snip> >>>>>> >>>>>>Let's quickly compare human lists and computer rankings. The Elo method allows >>>>>>to calculate the individual strength (performance) over the variable of age. In >>>>>>CC programs have no age at all, because almost each new version gets completely >>>>>>new limbs and organs so to speak. That means that you can't compare the old and >>>>>>the new version. Or would you compare the embryo with M. Dos Savant? We >>>>>>remember the old saying "You can't compare apples with beans". Nevertheless CC >>>>>>has ranking lists for decades now with the astonishing result that the newest >>>>>>progs are on top and the oldest, on the weakest hardware, are at the bottom. >Big surprise! >>>>>=================== >>>>>I agree with you 100%, Rolf on this issue: testing software on vastly unequal >>>>>hardware is totally a waste of time and an insult to the reader's intelligence, >>>>>really. >>>> >>>>I disagree >>>> >>>>It is not a waste of time to test programs with unequal hardware. >>>>Not always the better hardware wins and you can learn from the results. >>>> >>>>palm tiger has a 50% against kallisto inspite of the fact that kallisto has 486 >>>>and palm has significantly slower hardware. >>>> >>>>I think that it may be interesting to see also other programs on slow hardware >>>>and not only tiger14.9 but the ssdf has not unlimited time. >>>> >>>>I think that it is interesting to see how much rating programs earn from the new >>>>hardware and without testing programs on old hardware there is no way to know. >>>> >>>>You also need games against different opponents in order to generate rating list >>>>so games with unequal hardware are needed. >>>> >>>>Uri >>> >>> >>>This is not meant as aggressive, Uri, but excuse me, I must say that your final >>>sentence disqualifies you as a tester. You cannot proceed this way. Testing and >>>statics is not a question of input here and there to get safe results. The bias >>>alone from such intensiously implemented things invalidates your whole activity >>>as a tester. This might be difficult to understand for laymen but it's still the >>>truth. >> >>I do not understand what is the problem here. >> >>I think that the best thing to do is to give every 2 opponents to play the same >>number of games(unfortunately the ssdf cannot do it). >> >>The only problem that can make the rating misleading in that case is killer >>books and learning to repeat wins but hardware is not relevant for this problem. >> >>Uri > >I see that you have (?) little experience with statistics. The point is that you >should define all design _in advance_. Only then the results have a real >meaning. You simply can't take a few ancient progs if necessary and at will and >then "complete" your data. This is regarded as a gross miscarriage. > >The point is your argument that you need such matches to be able to calculate >your results! > >Rolf Tueschen I agree that it is better if everything is defined before doing the games and it is a disadvantage of the ssdf that there is no clear eules which games are going to be played but I do not see how it is relevant for the question if to play matches with unequal hardware. The player in the ssdf are software+hardware and not only software. Uri
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.