Author: Tina Long
Date: 02:08:43 05/26/02
Go up one level in this thread
On May 26, 2002 at 04:27:46, Martin Schubert wrote: >On May 25, 2002 at 23:17:05, Dann Corbit wrote: > >>On May 25, 2002 at 22:01:24, Rolf Tueschen wrote: >>[snip] >> >>>I do not say this. What I mean is, that they could even invest the same time in >>>a better testing. With no big changes. >>[snip] >> >>>Why not change a little bit of SSDF itself? >> >>What (exactly) are the changes you would have them make so that the list would >>be better? > >I don't understand why matches last sometimes 40 games, sometimes 43. Why not >say: a match lasts exactly 40 games. A small change without any effort. I asked this question of SSDF a few years ago. The answer from SSDF was approximatly: "These matches are played automatically, sometimes overnight. If when the tester returns that match has exceeded 40 games the extra games are included in the results, as more results is (better) than less." (I thought at the time that matches should be culled back to an even score, in this case 42, just to get even b&w. But I didn't think it mattered much as long as their was no bias - eg always letting Fritz start a match as white, so it could never have less whites than blacks) >Another point: if you took a look at the list where Shredder was leading you >could see that the leading programs had played their games against totally >different opponents. So you can't compare the ratings at all. As long as the number of opponents and number of games is large enough, then the ratings are as valid as if the programs had played the same opponents. The "other" opponents have valid ratings, so the results against "leading" opponents are equally valid. Not forgetting of course the degree of accuracy - the +-. >My suggestion: the top programms should play the same opponents to make it >possible to compare their results. This would give more interesting results tables, but theoretically the ratings would be no more accurate than the current ratings. This would also have the benefit of excluding results where top programs beat poorer programs by say 35-5. But again, would theoretically not give more accurate ratings. >If I remember right it happens quite often that a program is very strong in the >first rating list it appears in (where it plays against weak opponents). In the >next rating list where it has to fight the tough ones it falls back in the >rating list. Often (roughly 50% of the time) a program loses a bunch of points on its second showing, but with the bit of a search I did (the last 5 lists) I don't think this can be "blamed" on an increase in level of opponents. What you say has happened, but it has not always happened. There are also cases where the opposite is true - a program has a higher rating on second showing even though it has played tougher opponents. > >Regards, Martin Remember too that SSDF has a limited number of testers, a limited number of computers, and a limited number of copies of programs. I assume they test in the way they feel is best for their limited resources & time. They have been doing these tests for around 20 years, and are pretty compitant at what they're doing. Every list they publish causes all sorts of speculation regarding the accuracy of their results and the correctness of their methodoligy. It is impossible for them to test Exactly correctly, and it is more impossible for them to please all the people all the time. I like to take their lists as given, and I always take a good look at the +-. Regards, Tina
This page took 0.01 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.