Author: José de Jesús García Ruvalcaba
Date: 09:48:59 08/24/01
Go up one level in this thread
On August 24, 2001 at 12:36:18, Jeff Lischer wrote: >> >>Hi Uri, >>plese try the following experiment with elostat. >>1. Players A, B, and C play each other, with the following individual results: >>A beats B 99.5 to 0.5 >>B beats C 99.5 to 0.5 >>A beats C 100 to 0 >>Which ratings do you get for A, B and C using Elostat? >> >>2. The same players, but with the following results: >>A beats B 99.5 to 0.5 >>B beats C 99.5 to 0.5 >>Same question as for part 1. >> >>If the program behaves correctly, the rating of A for part 1 should not be lower >>as the rating of A for part 2. >>José. > >Excellent question! Although one can't perform your experiment with ELOStat >directly (because it only reads in PGN files), I can run it with code I have >written simulating ELOStat. If I assume an average rating of 2000: > >ELOStat Results: > Case 1. A = 2920, B = 2000, C = 1080 > Case 2. A = 2694, B = 2000, C = 1306 > >This is a problem I've known about with ELOStat. The problem comes from ELOStat >using the "average opponent" approach, which isn't strictly accurate because of >the non-linearity of the Elo formula. (Example: If I am rated 2000 and I play >someone rated 2400, I should score about 9%. If I play 2 people one 2000 and the >other 2800, I should score about 25%.) > >I have written a modified code that uses a "sum over opponents" approach (the >idea was suggested to me by Walter Koroljow) to take care of this problem. >Rather than using an average opponent rating, this method sums over all a >player's opponents and calculates the expected rating of the player. With that >modified approach I get the following: > >Modified Method Results: > Case 1. A = 2920, B = 2000, C = 1080 > Case 2. A = 2920, B = 2000, C = 1080 > Thanks! I consider this correct. I assume that Elostat is a fine tool which works well most of the time, but which fails in some odd cases. > >Incidently, here's the WMCCC performance results I found using the modified >method using an average Elo of 2300: > >1. Junior 2829 >2. Fritz 2618 >3. Tiger 2551 >4. Shredder 2545 >5. Crafty 2514 >6. Rebel 2512 >7. Goliath 2461 >8. Ferret 2429 >9. Gromit 2424 >10. Gandalf 2323 >11. ParSOS 2260 >12. Diep 2251 >13. IsiChess 2104 >14. Tao 2082 >15. Ruy Lopez 1994 >16. Pharaon 1980 >17. SpiderGirl 1915 >18. XiniX 1612 > >Shredder is still behind Tiger (barely), but this time ahead of Crafty and >Rebel. Well, Uri has a small point then. I do not think six rating points mean a lot, but they are there. Still, this is very different to the huge rating advantage Elostat gave to Tiger over Shredder. Thanks again, José.
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.