Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Solution to experiment!

Author: José de Jesús García Ruvalcaba

Date: 09:48:59 08/24/01

Go up one level in this thread


On August 24, 2001 at 12:36:18, Jeff Lischer wrote:

>>
>>Hi Uri,
>>plese try the following experiment with elostat.
>>1. Players A, B, and C play each other, with the following individual results:
>>A beats B 99.5 to 0.5
>>B beats C 99.5 to 0.5
>>A beats C 100 to 0
>>Which ratings do you get for A, B and C using Elostat?
>>
>>2. The same players, but with the following results:
>>A beats B 99.5 to 0.5
>>B beats C 99.5 to 0.5
>>Same question as for part 1.
>>
>>If the program behaves correctly, the rating of A for part 1 should not be lower
>>as the rating of A for part 2.
>>José.
>
>Excellent question! Although one can't perform your experiment with ELOStat
>directly (because it only reads in PGN files), I can run it with code I have
>written simulating ELOStat. If I assume an average rating of 2000:
>
>ELOStat Results:
>  Case 1. A = 2920, B = 2000, C = 1080
>  Case 2. A = 2694, B = 2000, C = 1306
>
>This is a problem I've known about with ELOStat. The problem comes from ELOStat
>using the "average opponent" approach, which isn't strictly accurate because of
>the non-linearity of the Elo formula. (Example: If I am rated 2000 and I play
>someone rated 2400, I should score about 9%. If I play 2 people one 2000 and the
>other 2800, I  should score about 25%.)
>
>I have written a modified code that uses a "sum over opponents" approach (the
>idea was suggested to me by Walter Koroljow) to take care of this problem.
>Rather than using an average opponent rating, this method sums over all a
>player's opponents and calculates the expected rating of the player. With that
>modified approach I get the following:
>
>Modified Method Results:
>  Case 1. A = 2920, B = 2000, C = 1080
>  Case 2. A = 2920, B = 2000, C = 1080
>

Thanks! I consider this correct. I assume that Elostat is a fine tool which
works well most of the time, but which fails in some odd cases.

>
>Incidently, here's the WMCCC performance results I found using the modified
>method using an average Elo of 2300:
>
>1.  Junior     2829
>2.  Fritz      2618
>3.  Tiger      2551
>4.  Shredder   2545
>5.  Crafty     2514
>6.  Rebel      2512
>7.  Goliath    2461
>8.  Ferret     2429
>9.  Gromit     2424
>10. Gandalf    2323
>11. ParSOS     2260
>12. Diep       2251
>13. IsiChess   2104
>14. Tao        2082
>15. Ruy Lopez  1994
>16. Pharaon    1980
>17. SpiderGirl 1915
>18. XiniX      1612
>
>Shredder is still behind Tiger (barely), but this time ahead of Crafty and
>Rebel.

Well, Uri has a small point then. I do not think six rating points mean a lot,
but they are there. Still, this is very different to the huge rating advantage
Elostat gave to Tiger over Shredder.
Thanks again,
José.



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.