Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Inflationary Effects?

Author: Sune Fischer

Date: 14:52:23 07/11/03

Go up one level in this thread


On July 11, 2003 at 13:32:59, Robert Hyatt wrote:

>On July 10, 2003 at 11:51:55, Keith Ian Price wrote:
>
>>On July 10, 2003 at 02:19:15, Tony Werten wrote:
>>
>>>On July 09, 2003 at 19:12:13, Keith Ian Price wrote:
>>>
>>>>On July 09, 2003 at 18:25:30, Jeroen van Dorp wrote:
>>>>
>>>>>On July 09, 2003 at 16:43:27, Keith Ian Price wrote:
>>>>>
>>>>>
>>>>>>That is not what he said. He said the 40-point difference was meaningful, but
>>>>>>the 2800+ rating was not, since it is not pegged to any absolute rating.
>>>>>
>>>>>As rating only tells you something about strenght differences, and nothing about
>>>>>"absolute" strenght - whatever that may be, how can't a rating be  meaningful,
>>>>>yet a rating difference can?
>>>>>
>>>>>
>>>>>J.
>>>>
>>>>The rating only tells about strength difference when compared to another in the
>>>>same pool. So it is the rating difference that's important. The lack of
>>>>importance as to the rating is whether it is 2800+ or 2700+, where the
>>>>percentage difference between 40 point differences would be small. If someone
>>>
>>>If the rating is inflated by 10 % then the difference between 2 ratings is also
>>>inflated by 10%
>>>
>>>This shouldn't be to difficult to check. A rating difference of 40 points should
>>>give a certain winpercentage. Did Shredder get this winpercentage ? Or did it
>>>only get the winpercentage against 200 points lower rated opponents ?
>>>
>>>Tony
>>>
>>>>were to say it should be 1000 instead of 2800, then it would be arguable that it
>>>>is not meaningless, but no one I've heard from is suggesting that.
>>>>
>>>>kp
>>
>>What I meant is that if 100 points were subtracted from all programs, the
>>relative difference between them would not be greatly affected. Future games
>>played at those levels would show a slightly smaller point spread, of course. If
>>they were to hack 1800 points off, without refiguring the percentage difference
>>in point spreads, then a false comparison would be seem to be shown and it would
>>be obvious. In this regard, the actual rating would make a difference and not be
>>meaningless. This was in answer to Jeroen's question how could it be
>>meaningless. I was saying this is the only way it wouldn't be meaningless. I
>>don't remember if they recalculated the point spreads between all programs years
>>ago when they lopped off 100 points from all SSDF scores.
>>
>>kp
>
>
>The other problem is that a _new_ engine starts at the top of the SSDF opponent
>list.  IE it starts right off playing the very best.  If it is a good program,
>it's rating is going to start off very high.  If it is slightly better than
>the best, it is going to end up with a higher rating than the previous best.
>
>Were it to start at the bottom and work its way up, this might be reduced a bit,
>maybe.  But nobody wants to test like that.  Go out and tackle #1 first.  :)
>
>I think the idea is that if you have a shark at the _bottom_ of the pool, then
>the bottom of the pool is going to drop a bit.  And if players at the top play
>the ones at the bottom, the top ratings will drop a bit.  And the shark will
>move up and maybe pass the #1 player, but he has lowered #1's rating because
>he lowered the ratings throughout the pool as he played them.  If you start
>the shark at the top, and he is really a shark, his rating is going to push to
>a new "high water mark".

That's not how it is done AFAIK, it wouldn't be correct.

It doesn't make too much sense to adjust the Elo numbers based on games where
one engine doesn't have an established rating.

I know that on many servers you play "one-side rated" for the first 20-50 games.

> And things get inflated, as he is using the rating
>inertia established by the deep "pool" to jack his rating higher.

It's only natural that the new better engine sets a new rating record, that
hasn't got anything to do with inflation.
It would actually be deflation if the top had to remain under a certain limit,
say like 2800.

Anyway, I don't believe the scale tends to inflate, I think it actually deflates
a bit.
I remember a post here some time ago, to a link where some dude had analysed
lots of FIDE games, and found that top players actually had to overperform to
keep their ratings when playing against low rated players.

I certainly believe that there are indications that the win/lose probablity
pridicted by the Elo scale is scewed and does not correlate with practice very
well when differences are large.

-S.



This page took 0.02 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.