Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Inflationary Effects?

Author: Matthew Hull

Date: 12:49:25 07/12/03

Go up one level in this thread


On July 12, 2003 at 14:08:19, Uri Blass wrote:

>On July 12, 2003 at 12:20:40, Robert Hyatt wrote:
>
>>On July 11, 2003 at 17:52:23, Sune Fischer wrote:
>>
>>>On July 11, 2003 at 13:32:59, Robert Hyatt wrote:
>>>
>>>>On July 10, 2003 at 11:51:55, Keith Ian Price wrote:
>>>>
>>>>>On July 10, 2003 at 02:19:15, Tony Werten wrote:
>>>>>
>>>>>>On July 09, 2003 at 19:12:13, Keith Ian Price wrote:
>>>>>>
>>>>>>>On July 09, 2003 at 18:25:30, Jeroen van Dorp wrote:
>>>>>>>
>>>>>>>>On July 09, 2003 at 16:43:27, Keith Ian Price wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>>That is not what he said. He said the 40-point difference was meaningful, but
>>>>>>>>>the 2800+ rating was not, since it is not pegged to any absolute rating.
>>>>>>>>
>>>>>>>>As rating only tells you something about strenght differences, and nothing about
>>>>>>>>"absolute" strenght - whatever that may be, how can't a rating be  meaningful,
>>>>>>>>yet a rating difference can?
>>>>>>>>
>>>>>>>>
>>>>>>>>J.
>>>>>>>
>>>>>>>The rating only tells about strength difference when compared to another in the
>>>>>>>same pool. So it is the rating difference that's important. The lack of
>>>>>>>importance as to the rating is whether it is 2800+ or 2700+, where the
>>>>>>>percentage difference between 40 point differences would be small. If someone
>>>>>>
>>>>>>If the rating is inflated by 10 % then the difference between 2 ratings is also
>>>>>>inflated by 10%
>>>>>>
>>>>>>This shouldn't be to difficult to check. A rating difference of 40 points should
>>>>>>give a certain winpercentage. Did Shredder get this winpercentage ? Or did it
>>>>>>only get the winpercentage against 200 points lower rated opponents ?
>>>>>>
>>>>>>Tony
>>>>>>
>>>>>>>were to say it should be 1000 instead of 2800, then it would be arguable that it
>>>>>>>is not meaningless, but no one I've heard from is suggesting that.
>>>>>>>
>>>>>>>kp
>>>>>
>>>>>What I meant is that if 100 points were subtracted from all programs, the
>>>>>relative difference between them would not be greatly affected. Future games
>>>>>played at those levels would show a slightly smaller point spread, of course. If
>>>>>they were to hack 1800 points off, without refiguring the percentage difference
>>>>>in point spreads, then a false comparison would be seem to be shown and it would
>>>>>be obvious. In this regard, the actual rating would make a difference and not be
>>>>>meaningless. This was in answer to Jeroen's question how could it be
>>>>>meaningless. I was saying this is the only way it wouldn't be meaningless. I
>>>>>don't remember if they recalculated the point spreads between all programs years
>>>>>ago when they lopped off 100 points from all SSDF scores.
>>>>>
>>>>>kp
>>>>
>>>>
>>>>The other problem is that a _new_ engine starts at the top of the SSDF opponent
>>>>list.  IE it starts right off playing the very best.  If it is a good program,
>>>>it's rating is going to start off very high.  If it is slightly better than
>>>>the best, it is going to end up with a higher rating than the previous best.
>>>>
>>>>Were it to start at the bottom and work its way up, this might be reduced a bit,
>>>>maybe.  But nobody wants to test like that.  Go out and tackle #1 first.  :)
>>>>
>>>>I think the idea is that if you have a shark at the _bottom_ of the pool, then
>>>>the bottom of the pool is going to drop a bit.  And if players at the top play
>>>>the ones at the bottom, the top ratings will drop a bit.  And the shark will
>>>>move up and maybe pass the #1 player, but he has lowered #1's rating because
>>>>he lowered the ratings throughout the pool as he played them.  If you start
>>>>the shark at the top, and he is really a shark, his rating is going to push to
>>>>a new "high water mark".
>>>
>>>That's not how it is done AFAIK, it wouldn't be correct.
>>
>>That is what happens in the SSDF however.  A new version starts out by
>>playing the top guys.  It should _obviously_ be stronger, so it will be
>>higher-rated.  Should the new program start out at the bottom, it would
>>still win, but the bottom ratings would also drop, which means that as
>>the top programs play the bottom programs, the top ratings would drop as
>>well, and by the time the new program gets to the top, it might not end up
>>much higher than the old "top".
>>
>>But at the present, the bottom of the SSDF rating pool is very inactive.
>>Since they can't drop because of the stronger players at the top, the only
>>thing that can happen is that the top gets higher and higher and higher,
>>-inflation-.
>>
>>>
>>>It doesn't make too much sense to adjust the Elo numbers based on games where
>>>one engine doesn't have an established rating.
>>>
>>>I know that on many servers you play "one-side rated" for the first 20-50 games.
>>>
>>>> And things get inflated, as he is using the rating
>>>>inertia established by the deep "pool" to jack his rating higher.
>>>
>>>It's only natural that the new better engine sets a new rating record, that
>>>hasn't got anything to do with inflation.
>>>It would actually be deflation if the top had to remain under a certain limit,
>>>say like 2800.
>>
>>Sure, but it just means that the 2800 doesn't relate to anything but the
>>SSDF pool.  Statistically, that is fine.  Practically, everyone wants it to
>>be FIDE-comparable.  It isn't.
>>
>>
>>
>>
>>
>>
>>>
>>>Anyway, I don't believe the scale tends to inflate, I think it actually deflates
>>>a bit.
>>>I remember a post here some time ago, to a link where some dude had analysed
>>>lots of FIDE games, and found that top players actually had to overperform to
>>>keep their ratings when playing against low rated players.
>>
>>Yes, but you miss the point.  In FIDE _everybody_ plays everybody over
>>time.  Because of seedings.  But in the SSDF rating pool, this is not
>>the case.
>
>I do not think that everybody plays everybody.
>Kasparov does not play or almost does not play with players with rating of 2400
>
>He did it when he was young but at that time kasparov was a weaker player.


But Kasparov is analogous to the established strong programs.  A new strong
player moves up from the bottom and does not start by playing Kasparov.  I think
that was the point.

Matt


>
>The difference in rating is less than 200 elo in most of the games.
>
>If shredder7.04(A1200) starts by playing 20 games against palm tiger14.9 and
>20 games against Fritz3(p90) then I doubt if it is going to make it's rating
>smaller.
>
>It has good chance to get 100% or almost 100% score in these games.
>
>>
>>>
>>>I certainly believe that there are indications that the win/lose probablity
>>>pridicted by the Elo scale is scewed and does not correlate with practice very
>>>well when differences are large.
>>>
>>>-S.
>
>Do you think that the weaker program is going to earn rating or lose rating when
>the difference is more than 300 elo?
>
>Uri



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.