Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Solution to experiment!

Author: Miguel A. Ballicora

Date: 11:08:39 08/24/01

Go up one level in this thread


On August 24, 2001 at 12:36:18, Jeff Lischer wrote:

>On August 24, 2001 at 10:34:01, José de Jesús García Ruvalcaba wrote:
>
>>On August 24, 2001 at 10:16:48, Uri Blass wrote:
>>
>>>On August 24, 2001 at 10:06:32, Miguel A. Ballicora wrote:
>>>
>>>>On August 24, 2001 at 07:51:16, Uri Blass wrote:
>>>>
>>>>>On August 24, 2001 at 07:29:21, Günther Simon wrote:
>>>>>
>>>>>>On August 24, 2001 at 07:15:30, Uri Blass wrote:
>>>>>>
>>>>>>>On August 24, 2001 at 07:06:51, Uri Blass wrote:
>>>>>>>
>>>>>>>>Here are the results by
>>>>>>>>elostat program
>>>>>>>>
>>>>>>>>You can see that shredder is only 3th place micro based on the performance.
>>>>>>>>Shredder is the world Micro champion by definition but Tiger and Rebel had a
>>>>>>>>better performance.
>>>>>>>>
>>>>>>>>
>>>>>>>>1 Deep Junior 7                  : 2745  228 281     9    88.9 %   2384   22.2 %
>>>>>>>>2 Quest (DeepFritz)              : 2550  266 169     9    66.7 %   2430   44.4 %
>>>>>>>>3 Chess Tiger 14.6 Gambit Tiger  : 2499  291 229     9    55.6 %   2461   22.2 %
>>>>>>>>4 Crafty 18.10X                  : 2467  291 165     9    55.6 %   2428   44.4 %
>>>>>>>>5 Rebel                          : 2466  291 229     9    55.6 %   2428   22.2 %
>>>>>>>>6 Shredder                       : 2466  266 249     9    66.7 %   2346   22.2 %
>>>>>>>>7 Goliath                        : 2421  291 165     9    55.6 %   2382   44.4 %
>>>>>>>>8 Gromit 3.9.5                   : 2364  278 201     9    61.1 %   2285   33.3 %
>>>>>>>>9 Ferret                         : 2359  291 229     9    55.6 %   2320   22.2
>>>>>>>>%10 Gandalf 5.0                   : 2310  291 229     9    55.6 %   2271   22.2
>>>>>>>>%
>>>>>>>>11 ParSOS                        : 2256  291 229     9    55.6 %   2217   22.2 %
>>>>>>>>12 Diep                          : 2227  165 291     9    44.4 %   2265   44.4 %
>>>>>>>>13 IsiChess X                    : 2166  201 278     9    38.9 %   2245   33.3 %
>>>>>>>>14 Tao                           : 2165  229 291     9    44.4 %   2203   22.2 %
>>>>>>>>15 Ruy Lopez                     : 2118  366 266     9    33.3 %   2238    0.0 %
>>>>>>>>16 Pharaon                       : 2082  169 266     9    33.3 %   2202   44.4 %
>>>>>>>>17 SpiderGirl                    : 2014  213 255     9    27.8 %   2180   33.3 %
>>>>>>>>18 XiNiX                         : 1724  400 108     9     5.6 %   2216   11.1 %
>>>>>>>>
>>>>>>>>congratulation also for the Deep Junior team for winning the event convincingly
>>>>>>>>when the difference from the second place is almost 200 elo and the hardware
>>>>>>>>explain less than 70 elo difference.
>>>>>>>>
>>>>>>>>Uri
>>>>>>>
>>>>>>>I can add that I think that it may be a better idea to use elostat to decide
>>>>>>>about the world champion in the future.
>>>>>>>
>>>>>>>I know that a lot of people are going to disagree but it is my opinion.
>>>>>>>I prefer a complicated method that does more justive and not a simple method.
>>>>>>>
>>>>>>>Uri
>>>>>>
>>>>>>
>>>>>>Sorry Uri - but this is really nonsens.
>>>>>>You cant use ELO-Stat on a Swiss Tournament with 9 rounds as
>>>>>>it is described by the author. ELO-Stat is designed to calculate
>>>>>>ratings out of a pool of unknown rated progs with a very very lot
>>>>>>of games.
>>>>>>Therefor if you take a closer look at your table you would see that
>>>>>>the error margin is at least 435!pts (Pharaon) and max 632!! (RuyLopez).
>>>>>>And would you really believe Parallel SOS to be at 2256? :))
>>>>>
>>>>>The question is not which program is better.
>>>>>competitions of 9 rounds are not supposed to answer this question.
>>>>>
>>>>>The question is which program did better result.
>>>>>The elostat answer this question better than the ranking
>>>>
>>>>You forget the tournament strategy. Many times, you can adjust the contempt
>>>>because you know that a draw is extremely convenient or will give you the
>>>>title right away. Not to mention the selection of more or less agressive opening
>>>>books for a special round. Sometimes, a draw is the same as a loss and you risk.
>>>>That throws away any significance of a performance ELO in a 9 round tournament.
>>>>This also applies for any human tournament.
>>>>
>>>>You can also have the weird situation where you got 8.5/9 and the one with 8/9
>>>>has a better elo performance. They drew each other but a couple of opponents
>>>>that play the 1st started to crash many games aftewards because of late minute
>>>>changes in the code etc. That was totally out of control of the winner.
>>>
>>>
>>>I think that it is not logical
>>>If you get 8.5/9 your results are not worse than a player who got 8/9 and drew
>>>against you.
>>>
>>>We look for a stable rating
>>>Suppose that you got 8.5/9
>>>Suppose that the rating of the player you drew is better than your rating
>>>I can prove that your rating is not stable and is going to get bigger after the
>>>tournament.
>>>
>>>you do not lose rating from winning 8 games and the rating of the opponents is
>>>not important.
>>>you win rating from drawing one game against a player with better elo rating so
>>>the total result is that you earn rating.
>>>
>>>
>>>If the elostat let situation when 8/9 is better than 8.5/9 including a draw
>>>between the 2 best players then something is wrong with the elostat program.
>>>
>>>Uri
>>
>>Hi Uri,
>>plese try the following experiment with elostat.
>>1. Players A, B, and C play each other, with the following individual results:
>>A beats B 99.5 to 0.5
>>B beats C 99.5 to 0.5
>>A beats C 100 to 0
>>Which ratings do you get for A, B and C using Elostat?
>>
>>2. The same players, but with the following results:
>>A beats B 99.5 to 0.5
>>B beats C 99.5 to 0.5
>>Same question as for part 1.
>>
>>If the program behaves correctly, the rating of A for part 1 should not be lower
>>as the rating of A for part 2.
>>José.
>
>Excellent question! Although one can't perform your experiment with ELOStat
>directly (because it only reads in PGN files), I can run it with code I have
>written simulating ELOStat. If I assume an average rating of 2000:
>
>ELOStat Results:
>  Case 1. A = 2920, B = 2000, C = 1080
>  Case 2. A = 2694, B = 2000, C = 1306
>
>This is a problem I've known about with ELOStat. The problem comes from ELOStat
>using the "average opponent" approach, which isn't strictly accurate because of
>the non-linearity of the Elo formula. (Example: If I am rated 2000 and I play
>someone rated 2400, I should score about 9%. If I play 2 people one 2000 and the
>other 2800, I  should score about 25%.)
>
>I have written a modified code that uses a "sum over opponents" approach (the
>idea was suggested to me by Walter Koroljow) to take care of this problem.
>Rather than using an average opponent rating, this method sums over all a
>player's opponents and calculates the expected rating of the player. With that
>modified approach I get the following:

Interesting, several years ago I wrote a program to maintain Volleyball Rankings
(NCAA, University Championship in US) using what I think what you mention here.
This year I saw that there was this program ELOstat and I thought that was doing
something similar already.
My program does not take as input a PGN, though, is difficult for Volleyball :-)
I think I will run this case though but I will have to type the results :-(

Regards,
Miguel



>
>Modified Method Results:
>  Case 1. A = 2920, B = 2000, C = 1080
>  Case 2. A = 2920, B = 2000, C = 1080
>
>
>Incidently, here's the WMCCC performance results I found using the modified
>method using an average Elo of 2300:
>
>1.  Junior     2829
>2.  Fritz      2618
>3.  Tiger      2551
>4.  Shredder   2545
>5.  Crafty     2514
>6.  Rebel      2512
>7.  Goliath    2461
>8.  Ferret     2429
>9.  Gromit     2424
>10. Gandalf    2323
>11. ParSOS     2260
>12. Diep       2251
>13. IsiChess   2104
>14. Tao        2082
>15. Ruy Lopez  1994
>16. Pharaon    1980
>17. SpiderGirl 1915
>18. XiniX      1612
>
>Shredder is still behind Tiger (barely), but this time ahead of Crafty and
>Rebel.



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.