Author: Matthew Hull
Date: 12:49:25 07/12/03
Go up one level in this thread
On July 12, 2003 at 14:08:19, Uri Blass wrote: >On July 12, 2003 at 12:20:40, Robert Hyatt wrote: > >>On July 11, 2003 at 17:52:23, Sune Fischer wrote: >> >>>On July 11, 2003 at 13:32:59, Robert Hyatt wrote: >>> >>>>On July 10, 2003 at 11:51:55, Keith Ian Price wrote: >>>> >>>>>On July 10, 2003 at 02:19:15, Tony Werten wrote: >>>>> >>>>>>On July 09, 2003 at 19:12:13, Keith Ian Price wrote: >>>>>> >>>>>>>On July 09, 2003 at 18:25:30, Jeroen van Dorp wrote: >>>>>>> >>>>>>>>On July 09, 2003 at 16:43:27, Keith Ian Price wrote: >>>>>>>> >>>>>>>> >>>>>>>>>That is not what he said. He said the 40-point difference was meaningful, but >>>>>>>>>the 2800+ rating was not, since it is not pegged to any absolute rating. >>>>>>>> >>>>>>>>As rating only tells you something about strenght differences, and nothing about >>>>>>>>"absolute" strenght - whatever that may be, how can't a rating be meaningful, >>>>>>>>yet a rating difference can? >>>>>>>> >>>>>>>> >>>>>>>>J. >>>>>>> >>>>>>>The rating only tells about strength difference when compared to another in the >>>>>>>same pool. So it is the rating difference that's important. The lack of >>>>>>>importance as to the rating is whether it is 2800+ or 2700+, where the >>>>>>>percentage difference between 40 point differences would be small. If someone >>>>>> >>>>>>If the rating is inflated by 10 % then the difference between 2 ratings is also >>>>>>inflated by 10% >>>>>> >>>>>>This shouldn't be to difficult to check. A rating difference of 40 points should >>>>>>give a certain winpercentage. Did Shredder get this winpercentage ? Or did it >>>>>>only get the winpercentage against 200 points lower rated opponents ? >>>>>> >>>>>>Tony >>>>>> >>>>>>>were to say it should be 1000 instead of 2800, then it would be arguable that it >>>>>>>is not meaningless, but no one I've heard from is suggesting that. >>>>>>> >>>>>>>kp >>>>> >>>>>What I meant is that if 100 points were subtracted from all programs, the >>>>>relative difference between them would not be greatly affected. Future games >>>>>played at those levels would show a slightly smaller point spread, of course. If >>>>>they were to hack 1800 points off, without refiguring the percentage difference >>>>>in point spreads, then a false comparison would be seem to be shown and it would >>>>>be obvious. In this regard, the actual rating would make a difference and not be >>>>>meaningless. This was in answer to Jeroen's question how could it be >>>>>meaningless. I was saying this is the only way it wouldn't be meaningless. I >>>>>don't remember if they recalculated the point spreads between all programs years >>>>>ago when they lopped off 100 points from all SSDF scores. >>>>> >>>>>kp >>>> >>>> >>>>The other problem is that a _new_ engine starts at the top of the SSDF opponent >>>>list. IE it starts right off playing the very best. If it is a good program, >>>>it's rating is going to start off very high. If it is slightly better than >>>>the best, it is going to end up with a higher rating than the previous best. >>>> >>>>Were it to start at the bottom and work its way up, this might be reduced a bit, >>>>maybe. But nobody wants to test like that. Go out and tackle #1 first. :) >>>> >>>>I think the idea is that if you have a shark at the _bottom_ of the pool, then >>>>the bottom of the pool is going to drop a bit. And if players at the top play >>>>the ones at the bottom, the top ratings will drop a bit. And the shark will >>>>move up and maybe pass the #1 player, but he has lowered #1's rating because >>>>he lowered the ratings throughout the pool as he played them. If you start >>>>the shark at the top, and he is really a shark, his rating is going to push to >>>>a new "high water mark". >>> >>>That's not how it is done AFAIK, it wouldn't be correct. >> >>That is what happens in the SSDF however. A new version starts out by >>playing the top guys. It should _obviously_ be stronger, so it will be >>higher-rated. Should the new program start out at the bottom, it would >>still win, but the bottom ratings would also drop, which means that as >>the top programs play the bottom programs, the top ratings would drop as >>well, and by the time the new program gets to the top, it might not end up >>much higher than the old "top". >> >>But at the present, the bottom of the SSDF rating pool is very inactive. >>Since they can't drop because of the stronger players at the top, the only >>thing that can happen is that the top gets higher and higher and higher, >>-inflation-. >> >>> >>>It doesn't make too much sense to adjust the Elo numbers based on games where >>>one engine doesn't have an established rating. >>> >>>I know that on many servers you play "one-side rated" for the first 20-50 games. >>> >>>> And things get inflated, as he is using the rating >>>>inertia established by the deep "pool" to jack his rating higher. >>> >>>It's only natural that the new better engine sets a new rating record, that >>>hasn't got anything to do with inflation. >>>It would actually be deflation if the top had to remain under a certain limit, >>>say like 2800. >> >>Sure, but it just means that the 2800 doesn't relate to anything but the >>SSDF pool. Statistically, that is fine. Practically, everyone wants it to >>be FIDE-comparable. It isn't. >> >> >> >> >> >> >>> >>>Anyway, I don't believe the scale tends to inflate, I think it actually deflates >>>a bit. >>>I remember a post here some time ago, to a link where some dude had analysed >>>lots of FIDE games, and found that top players actually had to overperform to >>>keep their ratings when playing against low rated players. >> >>Yes, but you miss the point. In FIDE _everybody_ plays everybody over >>time. Because of seedings. But in the SSDF rating pool, this is not >>the case. > >I do not think that everybody plays everybody. >Kasparov does not play or almost does not play with players with rating of 2400 > >He did it when he was young but at that time kasparov was a weaker player. But Kasparov is analogous to the established strong programs. A new strong player moves up from the bottom and does not start by playing Kasparov. I think that was the point. Matt > >The difference in rating is less than 200 elo in most of the games. > >If shredder7.04(A1200) starts by playing 20 games against palm tiger14.9 and >20 games against Fritz3(p90) then I doubt if it is going to make it's rating >smaller. > >It has good chance to get 100% or almost 100% score in these games. > >> >>> >>>I certainly believe that there are indications that the win/lose probablity >>>pridicted by the Elo scale is scewed and does not correlate with practice very >>>well when differences are large. >>> >>>-S. > >Do you think that the weaker program is going to earn rating or lose rating when >the difference is more than 300 elo? > >Uri
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.