Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Rating in ICC is meaningless and here is an example

Author: Robert Hyatt

Date: 17:43:46 01/14/03

Go up one level in this thread


On January 14, 2003 at 19:12:21, Miguel A. Ballicora wrote:

>On January 14, 2003 at 18:25:39, Robert Hyatt wrote:
>
>>On January 14, 2003 at 18:09:35, Miguel A. Ballicora wrote:
>>
>>>On January 14, 2003 at 16:28:03, Robert Hyatt wrote:
>>>
>>>>On January 14, 2003 at 15:56:04, Miguel A. Ballicora wrote:
>>>>
>>>>>On January 14, 2003 at 14:53:39, Robert Hyatt wrote:
>>>>>
>>>>>>On January 14, 2003 at 12:35:02, Miguel A. Ballicora wrote:
>>>>>>
>>>>>>>On January 14, 2003 at 10:55:38, Andrew Williams wrote:
>>>>>>>
>>>>>>>>On January 14, 2003 at 10:43:20, Uri Blass wrote:
>>>>>>>>
>>>>>>>>>
>>>>>>>>>{Game 494 (MoveiXX vs. ACCIDENTE) ACCIDENTE resigns} 1-0
>>>>>>>>>Blitz rating adjustment: 2635 --> 2602
>>>>>>>>>
>>>>>>>>>Movei won a game and lost rating.
>>>>>>>>>
>>>>>>>>>Uri
>>>>>>>>
>>>>>>>>It seems a bit strange when moveixx has played a total of *thirteen* games to
>>>>>>>>declare that the rating system is "meaningless". What you have observed only
>>>>>>>>occurs in the first few games. I've forgotten now how many games it requires
>>>>>>>>before it settles down.
>>>>>>>
>>>>>>>Uri is poiting out a flaw.
>>>>>>>The point that happen when one is provisional does not make it less serious.
>>>>>>>After 20 games you could end up with a very wrong rating, suppose that you
>>>>>>>played all 1000 -1500 elo players and won all of them. Later, you will lots of
>>>>>>>points from the rating pool causing deflation. Overall, I think that introduces
>>>>>>>a lot of noise. However, considering all the mess regarding these ratings, this
>>>>>>>point is not one of the worst.
>>>>>>>
>>>>>>>Miguel
>>>>>>
>>>>>>This is _not_ a "flaw".
>>>>>
>>>>>It is not a flaw, it is a major screw up considering how uneven is the
>>>>>population of players in ICC.
>>>>
>>>>It isn't a flaw, nor a major screw-up.  How about giving some good algorithm
>>>>to develop an approximate rating for a new player?
>>>
>>>There are many options to do it. For instance, you do not need to approximate.
>>>It is quite silly in the era of the computers to use paper and pencil
>>>approximations that Dr. Elo _had_ to do decades ago.
>>
>>I'm waiting on a real suggestion.  You play one game and beat a 1200 player.
>
>Uri gave you one, I gave you one. I elaborate more below.
>
>>What is your rating?  You play another game and lose to a 1200 player.  What
>>is your rating?
>>
>>You _must_ start somewhere...  And the only place you can start is by using
>>the ratings of the two players you have played, along with the results, to start
>>a first approximation to your rating.
>
>
>You do not understand. I am not talking about an approximation to the rating of
>the player, I am talking that it should be used an approximation of the formula
>used by elo (using averages of ratings) and real formulas should be used.

Then I have _absolutely_ no idea what you are talking about.  ICC uses the
normal Elo rating formula with (I believe) K=32.  The _same_ formula as is
used by FIDE, USCF, and most everyone else on the planet.

Of course you can't use that for the first N games as you have no starting
point, so everyone has a "provisional rating period" where the ratings are
calculated in a different way (TPR-type calculation) to come up with a
reasonable first approximation to the actual rating, and once the 20 game
threshold has been passed, the normal Elo calculations are done...

What exactly did you think ICC/FICS/etc actually did?

FICS actually does use a variation called the "Glickman" or "Glicko" system
which reduces the variability of a rating based on how many games are played
within a certain time frame.  The more games you play, the less your rating
fluctuates...

>
>
>>>>BTW you do know that just because a new player's rating fluctuates wildly,
>>>>his opponents do _not_ get all those points added or subtracted from _their_
>>>>ratings?
>>>
>>>>>It is based on an approximation. Every approximation works between certain
>>>>>boundaries.
>>>>>
>>>>>>For the first 20 games, you use a "provisional rating formula" and you can lose
>>>>>>points by winning if you play a much lower-rated player.  USCF does this.
>>>>>>_everybody_ does it as you have to get an initial rating from somewhere.
>>>>>
>>>>>USCF does that, that one of the reason why initial ratings in many cases are
>>>>>horrible and there were many cases of cheating because of this. For instance,
>>>>>kids that play only against 2000 rated people and their initial rating is 1600.
>>>>
>>>>What else would you propose?  There is no solution.  Criticizing the _only_
>>>>solution
>>>>makes little sense IMHO.
>>>
>>>What makes you think that this is the only solution?
>>>There are many rating systems around!
>>
>>I'm waiting for a suggestion for the _initial rating_.  All rating systems I
>>know
>>of use a TPR-type approximation to seed initial rating values.
>
>>>Even the simple solution proposed by Uri deserves consideration: not to take
>>>into account games were the average elo of A is >400 points than B.
>>>
>>>The one I could propose is you take the pool of players that you played and
>>>calculate what is the Elo that would give you the same amount of points that you
>>>obtained, doing the calculation "game by game", not by a crude average. For
>>>that, you need to iterate and that is the reason why most probably was never
>>>used at the beginning.
>>>
>>
>>Er... that is what the TPR approximates, in fact.  Which is _the_ point here.
>
>No, it is not. Classically the TPR is calculated from the average of your
>opposition, which is an approximation. (Still that is better than the crude
>average of your opposition, though). What I am saying is that you calculate it
>game by game. The problem is that you have to do it iteratively. Today, that is
>not a problem.

TPR is not done "iteratively".  Each game counts equally in your final
first approximation.



>
>What is your rating if you play
>1) 2600 draw
>2) 2600 draw
>3) 2600 draw
>4) 2000 win
>5) 2000 win
>6) 2000 win
>
>6 games, 4.5 points. Common sense indicate that your rating should be a tiny bit
>slightly above 2600. If you calculate it by the USCF method it is 2500.
>Horrible.

What else can you do?  What if you draw the first three, lose the second
three?

Statistics (Elo statistics) give a pretty accurate assessment of what will
happen based on those six games.  In normal tournaments, the above can't
happen.  If you draw three 2600's in the first three rounds, you won't be
paired against a 2000 in round 4 most likely.  But if you are, your TPR
suffers.  That's simply the way it is...



>
>Now do this:
>Ask, what is the expectancy for a 2300 player?
>2300-2000 => +300 --> 0.85 (IIRC)
>2300-2600 => -300 --> 0.15
>
>1) 0.15
>2) 0.15
>3) 0.15
>4) 0.85
>5) 0.85
>6) 0.85
>   3.00 = total.
>
>So, the expectancy will be 3.0/6.0 points that means that your rating is higher
>than 2300 since you got 4.5.
>Ask the same question for a 2400 player. Nope, it should be higher, what about a
>2700? nope, to high. Iterate until you find the answer. It will be slightly
>higher than 2600, as it should be.

But you are _making up_ numbers.  How likely is it that you will draw three
2600 GM players and then play three 2000 players and win?  Unfortunately, you
are in a "rating pool" with those 6 players and there is _no_ other way to
predict your outcome until you have played enough games to switch to the
normal Elo statistics.

>
>That means doing it game by game, not from the average.

Which means either the first game counts more or less than the last
game, which is simply flawed when the rating is unknown.



>
>>To do it any other way distorts the statistical significance.
>>
>>>Lots of things can be done.
>>>
>>>>>That is one of the reasons why when I started to play in US, my initial rating
>>>>>was way below the one that I should have had (personally I do not give a damn)
>>>>>because I played tournaments in the area against nobody. That is also the reason
>>>>>why Anatoly Karpov was rated (maybe still is) 2500 in USA. Ridiculous.
>>>>
>>>>You do realize that your rating reflects your results in a rating pool?  Once
>>>>again
>>>>you are using a local rating to compare with ratings from other pools.  It is
>>>>statistically invalid to do this.
>>>
>>>You are assuming, that I compared my elo somewhere else with the elo that I got
>>>in USCF and I was not happy. No, I compared the elo that I got with the elo of
>>>other people who played worse than me here in US. It took me a _long_ time until
>>>that was reversed and still my elo did not reach a balance. Partially, because
>>>it is difficult to increase you elo fast when you play opposition that is weaker
>>>than you.
>>
>>That is what the statistics involved produces.  And it is a _desired_ effect, in
>>fact.
>>Otherwise you could beat nobodys and produce a huge rating.
>>
>>
>>
>>>Besides, if I did the comparison USCF ratings are slightly overrated compared to
>>>FIDE so even if I did, I was not wrong. I was really tired of listening to my
>>>opponents saying: Are you really 2050?
>>>
>>>Karpov 2596? Come on!!! He played the US Amateur and beat a couple of players
>>>with a very low rating and that was the result. Yes, 6 games, but he won all of
>>>them.
>>>http://www.64.com/uscf/ratings/12730227
>>
>>So?  You can't re-write the statistics to produce a result you want for a
>>special
>>case...  I believe that USCF uses a FIDE rating as the initial rating if the
>
>No, it is not a special case. The case I am pointing out it shows the flaw.
>Karpov is not 2596 in '98. What did he do wrong? accepted to play against a
>couple of low rated people that screwed the average.

That is why most ratings don't include "matches".  Tournaments is what the
Elo system is all about.  Tournaments.  A pool of players.  A match is a pool
of two players and trying to use that to seed ratings for one of the two players
is simply statistically wrong...



>Bruce Moreland pointed the flaw in another message. You can really inflate your
>rating if you play against strong people at the beginning, or deflate yours if
>you play only very weak ones. It is enough if you include enough weak to throw
>your average to the bottom.
>
>Miguel


On ICC that is a problem, but it is unsolvable.  In a tournament you can't
do it, because as the TD I am going to pair you correctly and each time you
lose, you are going to play a _lower_ rated opponent the next round.

For ICC, there's no answer, except that non-rated players can't use computers
to play against to inflate their initial ratings.  And high-rated players won't
play non-rated players for that same reason.




This page took 0.01 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.