Author: Rolf Tueschen
Date: 06:49:54 05/26/02
Go up one level in this thread
On May 26, 2002 at 08:47:38, Tina Long wrote: >On May 26, 2002 at 08:13:08, Rolf Tueschen wrote: > >>On May 26, 2002 at 05:08:43, Tina Long wrote: >> >>>As long as the number of opponents and number of games is large enough, then the >>>ratings are as valid as if the programs had played the same opponents. The >>>"other" opponents have valid ratings, so the results against "leading" opponents >>>are equally valid. Not forgetting of course the degree of accuracy - the +-. >>> >> >>I would not support this. Many aspects are flawed. What is large enough? > >At least 12 opponents at 40 games/match to give a +-40ish deviation is large >enough to provide the information I derive from the SSDF list. Yes, that would be ok. But, Tina, this is not the case in the actual practice of SSDF. Tey do not have 12 opponents on the same hardware etc. (First necessary change, please read in other postings from me and Martin Schubert.) > >>You >>won't think that 40 is large enough?! > >40,000 is better, but 40 per match will do, as that is 1000 times quicker. I would be much more happy with 100. :) > >>Then your wording "equally valid" is >>unacceptable. > >How about "will give a similar +-deviation at 95% confidence level" But this is not the way to get validity, I'm sorry. We did not even touch that special problem until now. > >>I know what you mean, but if you make a testing design you must >>look after equality not during argumentation afterwards. > >Argumentation afterwards doesn't change anything. The stats are derived During >testing. Your sentence is wrong. Now this is my pleasure to correct my sentence. Let's see if I can contend you. I wanted to express that "at first before the beginning when you make your testing design you must define your variables so that control, constance and equality is guaranteed - not after the event during commentating", it was a rather trivial phrase. But very basic for statistics. This is all required for possible questions about being serious. Stats derived during testing? Here I don't understand you. >>I don't see your point. What is "accurate"? What do you expect after 40 games >>max.? > >No matter which opponents, as long as there are sufficient, the 95% deviation >will be similar for a similar number of games. This is a circle. And the embarrassing practice actually at SSDF. > >> >>>Remember too that SSDF has a limited number of testers, a limited number of >>>computers, and a limited number of copies of programs. I assume they test in >>>the way they feel is best for their limited resources & time. They have been >>>doing these tests for around 20 years, and are pretty compitant at what they're >>>doing. >> >>This is not the point. Like you > >No, I didn't think that at all. Do not say what you guess I think. Your "all >over Sweden" is different to my "limited number of testers". Limited means >small. :) Then let's speak it out. How many testers the have? > >>I thought that SSDF had a bunch of amateur >>testers all over Sweden. But this is false. The SSDF has very very few testers >>only left. This would be one of my proposals for a reformation of SSDF: >> >>°° the open declaration of the testers; I was informed by a real insider that >>some testers don't even collect their game scores (!) > >So what? Are you saying these testers cheat or lie because they don't keep game >scores? I don't believe that a bit. Did I say that? The question alone is implying it. You should not do that. It's similar to the question about slapping my Grandma. >> >>>Every list they publish causes all sorts of speculation regarding the accuracy >>>of their results and the correctness of their methodoligy. It is impossible for >>>them to test Exactly correctly, and it is more impossible for them to please all >>>the people all the time. >>> >>>I like to take their lists as given, and I always take a good look at the +-. >>> >>>Regards, >>>Tina >> >>This is unacceptable. You are confusing the main aspects. > >I am not confusing anything, you are saying I am ignorant just to support your >own theories - although we are yet to see you do anything but disagree. Wait, I explained. > >>It is _not_ the point >>that they "could not" test correctly. Of course they could. The accuracy of the >>results has nothing to do with correctness. In modern times it's no longer >>accepted that institutions can do what they want because they just are >>"existing". I hope that the SSDF is not of your opinion. They could change some >>practices and bingo they make a correct testing. > >We are all still waiting for you to tell us what they should change their >practices to. It is not enough to say "SSDF are bad testers" You MUST say how >they can be Good testers. And then estimate how much more statistically >accurate Your "correct testing" methods would be. I tried my very best. > >>The accuracy is a statistical >>problem of course. > >Only to the perciever. I have no worries with the accuracy of the SSDF as long >as they continue to provide confidence intervals. I can interpret what they >provide & I accept what they provide. > >I am greatful they provide it. > You are interpreting what they are doing? Or do you interprete their publications? That is speaking against the SSDF and their work. Rolf Tueschen >Tina Long >> >>Rolf Tueschen
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.