Author: Rolf Tueschen
Date: 11:21:10 09/13/02
Go up one level in this thread
On September 13, 2002 at 13:52:52, Uri Blass wrote: >On September 13, 2002 at 13:42:13, Rolf Tueschen wrote: > >>On September 13, 2002 at 13:17:57, Uri Blass wrote: >> >>>On September 13, 2002 at 13:04:50, Rolf Tueschen wrote: >>> >>>>On September 13, 2002 at 12:20:36, Uri Blass wrote: >>>> >>>>>On September 13, 2002 at 11:25:44, David Dory wrote: >>>>> >>>>>>On September 13, 2002 at 09:20:26, Rolf Tueschen wrote: >>>>>> >>>>>><snip> >>>>>>> >>>>>>>Let's quickly compare human lists and computer rankings. The Elo method allows >>>>>>>to calculate the individual strength (performance) over the variable of age. In >>>>>>>CC programs have no age at all, because almost each new version gets completely >>>>>>>new limbs and organs so to speak. That means that you can't compare the old and >>>>>>>the new version. Or would you compare the embryo with M. Dos Savant? We >>>>>>>remember the old saying "You can't compare apples with beans". Nevertheless CC >>>>>>>has ranking lists for decades now with the astonishing result that the newest >>>>>>>progs are on top and the oldest, on the weakest hardware, are at the bottom. >Big surprise! >>>>>>=================== >>>>>>I agree with you 100%, Rolf on this issue: testing software on vastly unequal >>>>>>hardware is totally a waste of time and an insult to the reader's intelligence, >>>>>>really. >>>>> >>>>>I disagree >>>>> >>>>>It is not a waste of time to test programs with unequal hardware. >>>>>Not always the better hardware wins and you can learn from the results. >>>>> >>>>>palm tiger has a 50% against kallisto inspite of the fact that kallisto has 486 >>>>>and palm has significantly slower hardware. >>>>> >>>>>I think that it may be interesting to see also other programs on slow hardware >>>>>and not only tiger14.9 but the ssdf has not unlimited time. >>>>> >>>>>I think that it is interesting to see how much rating programs earn from the new >>>>>hardware and without testing programs on old hardware there is no way to know. >>>>> >>>>>You also need games against different opponents in order to generate rating list >>>>>so games with unequal hardware are needed. >>>>> >>>>>Uri >>>> >>>> >>>>This is not meant as aggressive, Uri, but excuse me, I must say that your final >>>>sentence disqualifies you as a tester. You cannot proceed this way. Testing and >>>>statics is not a question of input here and there to get safe results. The bias >>>>alone from such intensiously implemented things invalidates your whole activity >>>>as a tester. This might be difficult to understand for laymen but it's still the >>>>truth. >>> >>>I do not understand what is the problem here. >>> >>>I think that the best thing to do is to give every 2 opponents to play the same >>>number of games(unfortunately the ssdf cannot do it). >>> >>>The only problem that can make the rating misleading in that case is killer >>>books and learning to repeat wins but hardware is not relevant for this problem. >>> >>>Uri >> >>I see that you have (?) little experience with statistics. The point is that you >>should define all design _in advance_. Only then the results have a real >>meaning. You simply can't take a few ancient progs if necessary and at will and >>then "complete" your data. This is regarded as a gross miscarriage. >> >>The point is your argument that you need such matches to be able to calculate >>your results! >> >>Rolf Tueschen > > >I agree that it is better if everything is defined before doing the games and it >is a disadvantage of the ssdf that there is no clear eules which games are going >to be played but I do not see how it is relevant for the question if to play >matches with unequal hardware. > >The player in the ssdf are software+hardware and not only software. > >Uri Uri, the answer is very easy. In principle: No intentions other than your question and then the applicating of statistics. Now you might discover that to reach a certain %% there is a minus in matches. And then you say how about some old progs on old hardware against the new combination for a change. This is nonsense and not allowed. Not allowed because intentions to reach a specific goal are forbidden, or are you not interested in real questions? But questions who are already answered must not be researched with testing. And why it is nonsense? Simply because there is no open question. That new progs on new hardware beat old stuff is already known as a fact. I would call such a routine cheating if it became known. Since the year 1996 when I started to debate such questions I got the answer of the "anything at all" in the question of where the games should come from. Main point: whereever they might come from! This is absolutely wrong and it's completely forbidden in statistics. Because with such an attitude you can prove what you want. But it's no longer correct stats. I got answers like this: ok, if this is true that 6 games in a match is a bit lowfat then how about the addition here where they played a few games more! In a completely different match of course! Or how about this answer. When I asked for validity out of 20 games or some such, I got the answer that SSDF played 20000 or 60000 games over the last decades. Unfortunately not all saved or destroyed in the big conflagrations ... :) Or one of the funniest conclusions. How are the actual tests validated? Well, the moment the new progs meet the older, they are validated. Why? Because the older were validated when meeting the old-older progs who wer validated by meeting the old-old-older progs who had been validated with the meeting of the [...] and these were once validated in the big fights between Super Conny against Swedish players long ago. Uri, even homoeopathy has more validity then because there is always at least a molecule left in each diluent. And after the final molecule it's a question of the _memory_ of the former molecules. Now the molecules at least have a undeniable power. But the power of the abilities of Super Conny compared with FRITZ 7 can only cause black holes if not more in the universe. Please enjoy the coming weekend without too much bloodshed over there. Rolf Tueschen
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.