Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Inflationary Effects?

Author: Uri Blass

Date: 19:56:44 07/12/03

Go up one level in this thread


On July 12, 2003 at 22:38:58, Robert Hyatt wrote:

>On July 12, 2003 at 14:08:19, Uri Blass wrote:
>
>>On July 12, 2003 at 12:20:40, Robert Hyatt wrote:
>>
>>>On July 11, 2003 at 17:52:23, Sune Fischer wrote:
>>>
>>>>On July 11, 2003 at 13:32:59, Robert Hyatt wrote:
>>>>
>>>>>On July 10, 2003 at 11:51:55, Keith Ian Price wrote:
>>>>>
>>>>>>On July 10, 2003 at 02:19:15, Tony Werten wrote:
>>>>>>
>>>>>>>On July 09, 2003 at 19:12:13, Keith Ian Price wrote:
>>>>>>>
>>>>>>>>On July 09, 2003 at 18:25:30, Jeroen van Dorp wrote:
>>>>>>>>
>>>>>>>>>On July 09, 2003 at 16:43:27, Keith Ian Price wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>That is not what he said. He said the 40-point difference was meaningful, but
>>>>>>>>>>the 2800+ rating was not, since it is not pegged to any absolute rating.
>>>>>>>>>
>>>>>>>>>As rating only tells you something about strenght differences, and nothing about
>>>>>>>>>"absolute" strenght - whatever that may be, how can't a rating be  meaningful,
>>>>>>>>>yet a rating difference can?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>J.
>>>>>>>>
>>>>>>>>The rating only tells about strength difference when compared to another in the
>>>>>>>>same pool. So it is the rating difference that's important. The lack of
>>>>>>>>importance as to the rating is whether it is 2800+ or 2700+, where the
>>>>>>>>percentage difference between 40 point differences would be small. If someone
>>>>>>>
>>>>>>>If the rating is inflated by 10 % then the difference between 2 ratings is also
>>>>>>>inflated by 10%
>>>>>>>
>>>>>>>This shouldn't be to difficult to check. A rating difference of 40 points should
>>>>>>>give a certain winpercentage. Did Shredder get this winpercentage ? Or did it
>>>>>>>only get the winpercentage against 200 points lower rated opponents ?
>>>>>>>
>>>>>>>Tony
>>>>>>>
>>>>>>>>were to say it should be 1000 instead of 2800, then it would be arguable that it
>>>>>>>>is not meaningless, but no one I've heard from is suggesting that.
>>>>>>>>
>>>>>>>>kp
>>>>>>
>>>>>>What I meant is that if 100 points were subtracted from all programs, the
>>>>>>relative difference between them would not be greatly affected. Future games
>>>>>>played at those levels would show a slightly smaller point spread, of course. If
>>>>>>they were to hack 1800 points off, without refiguring the percentage difference
>>>>>>in point spreads, then a false comparison would be seem to be shown and it would
>>>>>>be obvious. In this regard, the actual rating would make a difference and not be
>>>>>>meaningless. This was in answer to Jeroen's question how could it be
>>>>>>meaningless. I was saying this is the only way it wouldn't be meaningless. I
>>>>>>don't remember if they recalculated the point spreads between all programs years
>>>>>>ago when they lopped off 100 points from all SSDF scores.
>>>>>>
>>>>>>kp
>>>>>
>>>>>
>>>>>The other problem is that a _new_ engine starts at the top of the SSDF opponent
>>>>>list.  IE it starts right off playing the very best.  If it is a good program,
>>>>>it's rating is going to start off very high.  If it is slightly better than
>>>>>the best, it is going to end up with a higher rating than the previous best.
>>>>>
>>>>>Were it to start at the bottom and work its way up, this might be reduced a bit,
>>>>>maybe.  But nobody wants to test like that.  Go out and tackle #1 first.  :)
>>>>>
>>>>>I think the idea is that if you have a shark at the _bottom_ of the pool, then
>>>>>the bottom of the pool is going to drop a bit.  And if players at the top play
>>>>>the ones at the bottom, the top ratings will drop a bit.  And the shark will
>>>>>move up and maybe pass the #1 player, but he has lowered #1's rating because
>>>>>he lowered the ratings throughout the pool as he played them.  If you start
>>>>>the shark at the top, and he is really a shark, his rating is going to push to
>>>>>a new "high water mark".
>>>>
>>>>That's not how it is done AFAIK, it wouldn't be correct.
>>>
>>>That is what happens in the SSDF however.  A new version starts out by
>>>playing the top guys.  It should _obviously_ be stronger, so it will be
>>>higher-rated.  Should the new program start out at the bottom, it would
>>>still win, but the bottom ratings would also drop, which means that as
>>>the top programs play the bottom programs, the top ratings would drop as
>>>well, and by the time the new program gets to the top, it might not end up
>>>much higher than the old "top".
>>>
>>>But at the present, the bottom of the SSDF rating pool is very inactive.
>>>Since they can't drop because of the stronger players at the top, the only
>>>thing that can happen is that the top gets higher and higher and higher,
>>>-inflation-.
>>>
>>>>
>>>>It doesn't make too much sense to adjust the Elo numbers based on games where
>>>>one engine doesn't have an established rating.
>>>>
>>>>I know that on many servers you play "one-side rated" for the first 20-50 games.
>>>>
>>>>> And things get inflated, as he is using the rating
>>>>>inertia established by the deep "pool" to jack his rating higher.
>>>>
>>>>It's only natural that the new better engine sets a new rating record, that
>>>>hasn't got anything to do with inflation.
>>>>It would actually be deflation if the top had to remain under a certain limit,
>>>>say like 2800.
>>>
>>>Sure, but it just means that the 2800 doesn't relate to anything but the
>>>SSDF pool.  Statistically, that is fine.  Practically, everyone wants it to
>>>be FIDE-comparable.  It isn't.
>>>
>>>
>>>
>>>
>>>
>>>
>>>>
>>>>Anyway, I don't believe the scale tends to inflate, I think it actually deflates
>>>>a bit.
>>>>I remember a post here some time ago, to a link where some dude had analysed
>>>>lots of FIDE games, and found that top players actually had to overperform to
>>>>keep their ratings when playing against low rated players.
>>>
>>>Yes, but you miss the point.  In FIDE _everybody_ plays everybody over
>>>time.  Because of seedings.  But in the SSDF rating pool, this is not
>>>the case.
>>
>>I do not think that everybody plays everybody.
>>Kasparov does not play or almost does not play with players with rating of 2400
>>
>>He did it when he was young but at that time kasparov was a weaker player.
>>
>>The difference in rating is less than 200 elo in most of the games.
>
>I can only tell you what happens in events I have attended.  IE the "open
>section" of major events like the US Open.  There you get players from 2000+ to
>whatever GMs show up, certainly beyond 2600.  It's how the Swiss pairing works.
>That is the only thing that makes the rating pool concept work.

Yes, but only in the first rounds we get a lot of games when the difference in
rating is more than 200 elo and we still have in most of the games difference
that is smaller than 200 elo.

There are also closed competitions when everyone plays against everyone
and in these competitions GM's play only with strong opponents.

>
>If only high-rated players play high-rated players, you have _two_ pools and
>the rating difference between the two pools is meaningless.

Not exactly.
even without swiss tournaments
2800 player can play against 2650 player and the 2650 player can play in another
tournament against 2500 player when the 2500 player can play in another
tournament against 2350 player...


>
>>
>>If shredder7.04(A1200) starts by playing 20 games against palm tiger14.9 and
>>20 games against Fritz3(p90) then I doubt if it is going to make it's rating
>>smaller.
>>
>>It has good chance to get 100% or almost 100% score in these games.
>
>But one draw will cost it rating points that it won't make up with 19 wins.
>The other point is that as it beats palm tiger, palm tiger is going down in
>rating also, which will mean that others that beat it will see a smaller
>rating improvement.

If palm tiger's rating is going down then Shredder7.04's rating is going up.
I agree that in this case other that beats palm tiger may get smaller rating so
the influence is for both sides but I do not see a reason to assume that the
influence for going up is higher.

Uri



This page took 0.01 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.