Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: ELO isn't a normal bell curve, without some transformation

Author: Dann Corbit

Date: 21:05:34 06/04/02

Go up one level in this thread


On June 04, 2002 at 23:30:56, Stephen A. Boak wrote:

>hi Dan,
>
>1. Since Elo's system defines (by design choice!) each specific rating
>difference in terms of a specific expected scoring percentage, regardless of
>where the two ratings fall on the scale, I suspect (but am not sure, not having
>worked out the math yet on paper) that the simple plotting of ratings in a
>histogram would not be a normal bell-shaped curve.

The original question was whether or not computer strength was normally
distributed.  This is a different random variable and is also normally
distributed.  We can also plot win percentages multiplied by the opponent's
strength just as well:

SELECT int((win_percentage * opponent_strength)/5000), count((win_percentage *
opponent_strength)/5000)
FROM SSDF
GROUP BY int((win_percentage * opponent_strength)/5000);

Expr1000	Expr1001
8	3
9	3
10	3
11	6
12	3
13	12
14	6
15	9
16	10
17	15
18	7
19	12
20	6
21	18
22	18
23	20
24	14
25	11
26	13
27	13
28	6
29	10
30	2
31	4
32	1
33	7
34	1

Or (squished a bit more):

SELECT int(([win_percentage]*[opponent_strength])/15000),
count(([win_percentage]*[opponent_strength])/15000)
FROM SSDF
GROUP BY int(([win_percentage]*[opponent_strength])/15000);

Expr1000	Expr1001
2	3
3	12
4	21
5	34
6	25
7	56
8	38
9	29
10	7
11	8

>Wouldn't some transformation be required to convert such ratings into
>'normalized' figures which *theoretically* might look more like a bell shaped
>curve?

There is a surprising range of curve shapes that still fit the gaussian model
pretty well.

>2. Over time, as new & improved program versions & ratings rise, due to advances
>in chess programming algorithms & techniques (and hardware improvements,
>perhaps), wouldn't the overall plotting of ratings on a histogram (roughly from
>older, weaker programs to newer, stronger programs) more closely follow the
>growth curve for average rating of each new crop of released program/hardware,
>rather than the normal bell curve.

Since they are different hardware setups or different program versions, they are
treated as different organisms.  The method you suggest should only be used to
model a single program, and then only changing one variable at a time (unless
you intend to generate a surface)

>3. Perhaps any program crop released within a relatively short span of time (say
>a year or so) would have ratings plottable (with transformation, as noted above)
>that closely approximate the normal (bell) curve.

I think probably the leptokurtotic shape is a function of reality.  In other
words, if a program is dominatingly better, nobody would buy the others.  If a
program is dominatingly weak, then nobody will buy it.  So they are forced to be
fairly close in ability.  There is a broad mass with nearly equal ability and a
few outliers with exceptional strength or weakness.

In other words, if someone wrote a 3000 ELO program, it would be the only one
that people bought and got tested and we would see a spike.

If someone writes a 1000 ELO program, nobody is going to buy it.  So successful
enough programs to be sold and tested will pretty much lie in a band.



This page took 0.01 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.