Author: Tina Long
Date: 05:47:38 05/26/02
Go up one level in this thread
On May 26, 2002 at 08:13:08, Rolf Tueschen wrote: >On May 26, 2002 at 05:08:43, Tina Long wrote: > >>As long as the number of opponents and number of games is large enough, then the >>ratings are as valid as if the programs had played the same opponents. The >>"other" opponents have valid ratings, so the results against "leading" opponents >>are equally valid. Not forgetting of course the degree of accuracy - the +-. >> > >I would not support this. Many aspects are flawed. What is large enough? At least 12 opponents at 40 games/match to give a +-40ish deviation is large enough to provide the information I derive from the SSDF list. >You >won't think that 40 is large enough?! 40,000 is better, but 40 per match will do, as that is 1000 times quicker. >Then your wording "equally valid" is >unacceptable. How about "will give a similar +-deviation at 95% confidence level" >I know what you mean, but if you make a testing design you must >look after equality not during argumentation afterwards. Argumentation afterwards doesn't change anything. The stats are derived During testing. Your sentence is wrong. >It's simply not sound >the other way round. So I agree with Martin Schubert. > >Schubert: >>>My suggestion: the top programms should play the same opponents to make it >>>possible to compare their results. >> >>This would give more interesting results tables, but theoretically the ratings >>would be no more accurate than the current ratings. >>This would also have the benefit of excluding results where top programs beat >>poorer programs by say 35-5. But again, would theoretically not give more >>accurate ratings. > >I don't see your point. What is "accurate"? What do you expect after 40 games >max.? No matter which opponents, as long as there are sufficient, the 95% deviation will be similar for a similar number of games. > >>Remember too that SSDF has a limited number of testers, a limited number of >>computers, and a limited number of copies of programs. I assume they test in >>the way they feel is best for their limited resources & time. They have been >>doing these tests for around 20 years, and are pretty compitant at what they're >>doing. > >This is not the point. Like you No, I didn't think that at all. Do not say what you guess I think. Your "all over Sweden" is different to my "limited number of testers". Limited means small. >I thought that SSDF had a bunch of amateur >testers all over Sweden. But this is false. The SSDF has very very few testers >only left. This would be one of my proposals for a reformation of SSDF: > >°° the open declaration of the testers; I was informed by a real insider that >some testers don't even collect their game scores (!) So what? Are you saying these testers cheat or lie because they don't keep game scores? I don't believe that a bit. > > >> >>Every list they publish causes all sorts of speculation regarding the accuracy >>of their results and the correctness of their methodoligy. It is impossible for >>them to test Exactly correctly, and it is more impossible for them to please all >>the people all the time. >> >>I like to take their lists as given, and I always take a good look at the +-. >> >>Regards, >>Tina > >This is unacceptable. You are confusing the main aspects. I am not confusing anything, you are saying I am ignorant just to support your own theories - although we are yet to see you do anything but disagree. >It is _not_ the point >that they "could not" test correctly. Of course they could. The accuracy of the >results has nothing to do with correctness. In modern times it's no longer >accepted that institutions can do what they want because they just are >"existing". I hope that the SSDF is not of your opinion. They could change some >practices and bingo they make a correct testing. We are all still waiting for you to tell us what they should change their practices to. It is not enough to say "SSDF are bad testers" You MUST say how they can be Good testers. And then estimate how much more statistically accurate Your "correct testing" methods would be. >The accuracy is a statistical >problem of course. Only to the perciever. I have no worries with the accuracy of the SSDF as long as they continue to provide confidence intervals. I can interpret what they provide & I accept what they provide. I am greatful they provide it. Tina Long > >Rolf Tueschen
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.