Author: Martin Schubert
Date: 03:30:06 05/26/02
Go up one level in this thread
On May 26, 2002 at 05:08:43, Tina Long wrote: >On May 26, 2002 at 04:27:46, Martin Schubert wrote: > >>On May 25, 2002 at 23:17:05, Dann Corbit wrote: >> >>>On May 25, 2002 at 22:01:24, Rolf Tueschen wrote: >>>[snip] >>> >>>>I do not say this. What I mean is, that they could even invest the same time in >>>>a better testing. With no big changes. >>>[snip] >>> >>>>Why not change a little bit of SSDF itself? >>> >>>What (exactly) are the changes you would have them make so that the list would >>>be better? >> >>I don't understand why matches last sometimes 40 games, sometimes 43. Why not >>say: a match lasts exactly 40 games. A small change without any effort. > >I asked this question of SSDF a few years ago. The answer from SSDF was >approximatly: "These matches are played automatically, sometimes overnight. If >when the tester returns that match has exceeded 40 games the extra games are >included in the results, as more results is (better) than less." > >(I thought at the time that matches should be culled back to an even score, in >this case 42, just to get even b&w. But I didn't think it mattered much as long >as their was no bias - eg always letting Fritz start a match as white, so it >could never have less whites than blacks) > >>Another point: if you took a look at the list where Shredder was leading you >>could see that the leading programs had played their games against totally >>different opponents. So you can't compare the ratings at all. > >As long as the number of opponents and number of games is large enough, then the >ratings are as valid as if the programs had played the same opponents. The >"other" opponents have valid ratings, so the results against "leading" opponents >are equally valid. Not forgetting of course the degree of accuracy - the +-. I don't agree on that. Because I'm sure there are programs playing better against weak opponents and there are programs playing better against strong opponents. There is no "valid rating". A rating always depends on the opponents. In human chess nobody would have the idea to calculate confidence intervalls and to make statistical observations if one player is better than another player. But if you do it you have to think about the problem that there is no valid rating. It depends on the opponent. > >>My suggestion: the top programms should play the same opponents to make it >>possible to compare their results. > >This would give more interesting results tables, but theoretically the ratings >would be no more accurate than the current ratings. Depends on your theory. If your theory is: "Every program has one rating which doesn't depend on the opponent" then you're right. But if you see that there are always "Angstgegner" (what's that in english? Angstgegner means an opponent against whom you're results are always very bad) of a program you have to think about what kind of rating you want to look at. >This would also have the benefit of excluding results where top programs beat >poorer programs by say 35-5. But again, would theoretically not give more >accurate ratings. > >>If I remember right it happens quite often that a program is very strong in the >>first rating list it appears in (where it plays against weak opponents). In the >>next rating list where it has to fight the tough ones it falls back in the >>rating list. > >Often (roughly 50% of the time) a program loses a bunch of points on its second >showing, but with the bit of a search I did (the last 5 lists) I don't think >this can be "blamed" on an increase in level of opponents. > >What you say has happened, but it has not always happened. There are also cases >where the opposite is true - a program has a higher rating on second showing >even though it has played tougher opponents. This case happens to often. >> >>Regards, Martin > >Remember too that SSDF has a limited number of testers, a limited number of >computers, and a limited number of copies of programs. I assume they test in >the way they feel is best for their limited resources & time. They have been >doing these tests for around 20 years, and are pretty compitant at what they're >doing. > >Every list they publish causes all sorts of speculation regarding the accuracy >of their results and the correctness of their methodoligy. It is impossible for >them to test Exactly correctly, and it is more impossible for them to please all >the people all the time. I know that. The can't provide a perfect list. But I'm sure they can provide a better list without much more effort. > >I like to take their lists as given, and I always take a good look at the +-. > >Regards, >Tina I do like their list too. I know which conclusions I can draw from their list and which I can't. But so many people who are looking at the list don't have the statistical background and are drawing the wrong conclusions. Regards, Martin
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.