Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: how not to calculate performance

Author: Stephen A. Boak

Date: 10:04:07 10/24/04

Go up one level in this thread


On October 24, 2004 at 02:12:51, Uri Blass wrote:

>On October 24, 2004 at 01:47:54, Stephen A. Boak wrote:
>
>>On October 23, 2004 at 18:47:26, Vincent Lejeune wrote:
>>
>>>On October 23, 2004 at 16:37:37, Stephen A. Boak wrote:
>>>
>>>>On October 22, 2004 at 18:52:13, Uri Blass wrote:
>>>>
>>>>>On October 22, 2004 at 18:30:34, James T. Walker wrote:
>>>>>
>>>>>>On October 22, 2004 at 13:32:57, Uri Blass wrote:
>>>>>>
>>>>>>>go to the following link
>>>>>>>
>>>>>>>http://georgejohn.bcentralhost.com/TCA/perfrate.html
>>>>>>>
>>>>>>>enter 1400 for 12 opponents
>>>>>>>enter 0 for your total score
>>>>>>>
>>>>>>>Your performance is 1000 but if you enter 1 to your total score your performance
>>>>>>>is only 983.
>>>>>>>
>>>>>>>It seems that the program in that link assume that when the result is 100% or 0%
>>>>>>>your performance is 400 elo less that your weakest opponent but when your score
>>>>>>>is not 100% it has not that limit so they get illogical results.
>>>>>>>
>>>>>>>Uri
>>>>>>
>>>>>>My take on this is they are using a bad formula or have screwed up the program
>>>>>>to calculate the Rp.
>>>>>>The USCF uses Rp=Rc + 400(W-L)/N
>>>>>
>>>>>It seems that the USCF does not do it in that way
>>>>>
>>>>>They admit that the formula is not correct for players who won all their games
>>>>>
>>>>>Note:  In the case of a perfect or zero score the performance rating is
>>>>>estimated as either 400 points higher or lower, respectively, than the rating of
>>>>>highest or lowest rated opponent.
>>>>>
>>>>>It is probably better to estimate the preformance based on comparison to  the
>>>>>case that the player did almost perfect score.
>>>>>
>>>>>Uri
>>>>
>>>>Dear Uri,
>>>>What is the *correct* formula for a player who has won (or lost) all his games?
>>>>:)
>>>>Regards,
>>>>--Steve
>>>
>>>
>>>For such a player, the error margin = infinity
>>>
>>>the perf = average opp +400 to +infinity
>>
>>Thanks, Vincent.  I know the formula well.  :)
>>
>>I was poking fun at Uri (just teasing) for complaining about 'logic' when in
>>fact the formula for all wins or all losses is purely arbitrary.
>>
>>[I've read that Uri is a mathematician, so I like to occasionally jump in and
>>comment when he seems to overlook something basic.  All in good fun--I
>>appreciate his postings and chess programming contributions.]
>>
>>I asked Uri what formula would he suggest as 'correct'.
>
>I think that it is possible to calculate the performance of a player that get
>1/2 point instead of 0 point and use the result as an upper bound for the
>performance of the player that got 0 points.
>
>It is not done.
>
>Another idea is to assume probability of win draw loss for every difference in
>rating and to calculate the maximal rating that the probability to get 0 points
>is 50% or more than it.

Ok, Uri, I accept the challenge.

Assume a player scores 0 points out of 10 total games.

What Win, Draw, Loss probabilities should be used (arbitrary once again, for an
unknown player's strength, who has scored m=0 out of m games--right?), and for
what difference in rating?

What is the 'logic' for your choice of 'difference in rating'?  That choice has
to be illogical (totally arbitrary), when the new player doesn't yet have a
rating, right?

ELO does exactly that already--assumes the player is exactly 400 points less
(i.e. assumes a particular rating difference) than his opponents' average
rating, against which opponents he has scored m=0 points).

What is a more logical rating estimate for the player than 400 points less than
the average rating of his opponents?  What is the improved 'logic' for your
suggested rating approach?

My thesis is that any suggested formula involves as much guessing, as much
arbitrary choice, as much 'illogical' thinking (because it is totally arbitrary)
... as the original +/- 400 points rule used in the ELO system.

At least the +/- 400 points system (per my recollection of Arpad Elo's book,
'The Rating of Chess Players, Past and Present") is based on a fixed multiple of
some +/- 100 or +/- 200 points assumed (desired) Standard Deviation in the
measure of a player's strength.  If a player scores 0 points against opponents
average rating of XXXX, then the player is Y standard deviations lower rating
than the average of his opponents.  Totally arbitrary, yet with an internal
consistency or 'logic' equal to (but not worse than) that of any other formula.

Elo could have used +/- 100 points, or +/- 800 points.  Both would have worked
to some degree.  Both would have been inaccurate to some degree.  Which would
have been better than +/- 400 points?  Neither, in my estimation (opinion only).

Arpad Elo did not choose to set the rating [of a player who has scored m=0 out
of n] at 1000 points below his opponents' average.  Nor did he choose to set it
at 50 points below his opponents' average rating.  He selected Y=2 (my
recollection) Standard Deviations, for his rating system.

In his rating system (the ELO system), Arpad defined the rating numbers in a
mathematical way such that any pair of rating numbers with a given rating
difference, no matter where on the rating scale the pair exist, theoretically
have the same relative probability of scoring points.  Example:  Two players
with 2000 and 2100 ratings, have same relative probabilities of scoring as two
players of 2300 and 2400 ratings.

This mathematical underpinning works very well as designed, in general, in the
ELO system.  However, it is known by mathematicians (Jeff Sonas, etc) that it
doesn't work quite as well for players with widely disparate ratings (e.g. 1000
and 2000, versus 2000 and 3000).

This is not the exclusive flaw of the ELO system, but an inability of the
science of statistics to measure relative strengths for a pair of players of
vastly differing strengths.

The power (accuracy) of statistics is enhanced when the measuring is done near
the mean, near the center, of the population.  It is weakened when applied to
extremes at either end (players who win *all* games, players who lose *all*
games).

Standard Deviation, of course, is a notion based on the impossibility of
measuring something exactly, i.e. because of natural deviation (randomness) in
subject performance, and simultaneously randomness in the measuring technique.

If we do *not* know the rating of a player who scores 0 points out of X games,
all such a priori estimators (formulas) are equally illogical.  Do you agree?

If the player cannot gain even 1/2 point from his initial 10 opponents, we do
*not* know just how bad (good) he really is.  Correct?  No amount of formula
construction or twiddling will cause that fact to change.

No adoption of a formula can be other than purely arbitrary in such cases.

>
>This is going to be the performance of player who got 0 points.
>
>More generally I can define
>Performance of a player who got m point out of n points  as the rating that
>the probability to get more than m points(or exactly m points if m=n) is equal
>to the probability to get less than m points(or exactly m points if m=0).

The problem is that the general definition does not logically apply to the
situation where the player has scored m=0 points out of n games.

It is said in statistics that the value of statistics to assign a measure (or to
forecast the future) works best when the thing being measured falls within a few
standard deviations of the mean of the population [or mean of the samples being
taken].

When the thing being measured is near the limits of the pool or the near the
bounds of the samples, the assigned measure is less accurate, less useful--has a
very small confidence level.

This is a natural limitation in statistics.  No amount of arbitrary formula
selection can guarantee better results.

Indeed, to assess if any particular formula is 'better' in any manner, you have
to 'a priori' assume the definition of 'better'.

That 'a priori' assumption (whatever it is) will of necessity contain an equal
amount of illogical thinking (arbitrary assumptions) as the original +/- 400
points rule.  Why?  Because assigning a measure to something that has not been
truly measured is always abitrary.

>
>
>Uri
_______________________________________
hi Uri,

Yes, you can define it in such ways.  But those ways are totally arbitrary (see
my commments above).  That is my point.  No more logical than that chosen by
Arpad Elo.

The player did *not* in fact earn that 1/2 point.  So your methods are also
illogical.

We can also take the lowest ELO earned by a player in the same pool that did
scored only 1/2 point in his lifetime, even if only 1/2 point out of 100 games
or whatever.  That ELO can be arbitrarily assigned as the rating of a player who
scored 0 points out of X games.

But assuming that such player is equally bad, or that all such players are
equally bad, is totally arbitrary (illogical).

The point is not that formulas cannot be created.  USCF has a formula for
players with 0 points, or all wins.  FIDE (I guess) has a similar formula for
players with 0 points, or all wins.

Regardless of the formula, it is totally arbitrary.

The reasoning behind each such choice is just as logical (or in my view,
illogical) as another such choice.

That is my point.
___________________________
Other possible formulas (equally arbitrary) for a player who has scored m=0 out
of n games:

1. Use the average rating of all existing players who have scored exactly 1/2
point total in tournament play.

2. Use the average initial rating of all players who initially scored exactly
1/2 point total.

3. Split the difference between 1) or 2), above, and zero.

I like mathematics.  I like statistics.  I'm not an expert in either.  But I do
like the power of mathematics and the power of statistics to measure, explain
and forecast in better ways than methods that do not rely on mathematics or
statistics.

I also like to understand and apply the limits of such tools--to avoid
overreliance on them when such confidence is unwarranted.

BOTTOM LINE--Your claim of 'illogical' (regarding the +/- 400 rule, as applied
to players who score 0 or 100%) goes too far.

Better to realize that rating estimator tools are limited when when a player
scores 0 percent or 100% points out of n games.

Do not assume that one arbitrary rating assumption is better or worse than
another arbitrary assumption.

Best regards,
--Steve




This page took 0.01 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.