Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Congratulation for chesstiger(better performance than shredder in wmccc)

Author: Uri Blass

Date: 07:50:36 08/24/01

Go up one level in this thread


On August 24, 2001 at 10:34:01, José de Jesús García Ruvalcaba wrote:

>On August 24, 2001 at 10:16:48, Uri Blass wrote:
>
>>On August 24, 2001 at 10:06:32, Miguel A. Ballicora wrote:
>>
>>>On August 24, 2001 at 07:51:16, Uri Blass wrote:
>>>
>>>>On August 24, 2001 at 07:29:21, Günther Simon wrote:
>>>>
>>>>>On August 24, 2001 at 07:15:30, Uri Blass wrote:
>>>>>
>>>>>>On August 24, 2001 at 07:06:51, Uri Blass wrote:
>>>>>>
>>>>>>>Here are the results by
>>>>>>>elostat program
>>>>>>>
>>>>>>>You can see that shredder is only 3th place micro based on the performance.
>>>>>>>Shredder is the world Micro champion by definition but Tiger and Rebel had a
>>>>>>>better performance.
>>>>>>>
>>>>>>>
>>>>>>>1 Deep Junior 7                  : 2745  228 281     9    88.9 %   2384   22.2 %
>>>>>>>2 Quest (DeepFritz)              : 2550  266 169     9    66.7 %   2430   44.4 %
>>>>>>>3 Chess Tiger 14.6 Gambit Tiger  : 2499  291 229     9    55.6 %   2461   22.2 %
>>>>>>>4 Crafty 18.10X                  : 2467  291 165     9    55.6 %   2428   44.4 %
>>>>>>>5 Rebel                          : 2466  291 229     9    55.6 %   2428   22.2 %
>>>>>>>6 Shredder                       : 2466  266 249     9    66.7 %   2346   22.2 %
>>>>>>>7 Goliath                        : 2421  291 165     9    55.6 %   2382   44.4 %
>>>>>>>8 Gromit 3.9.5                   : 2364  278 201     9    61.1 %   2285   33.3 %
>>>>>>>9 Ferret                         : 2359  291 229     9    55.6 %   2320   22.2
>>>>>>>%10 Gandalf 5.0                   : 2310  291 229     9    55.6 %   2271   22.2
>>>>>>>%
>>>>>>>11 ParSOS                        : 2256  291 229     9    55.6 %   2217   22.2 %
>>>>>>>12 Diep                          : 2227  165 291     9    44.4 %   2265   44.4 %
>>>>>>>13 IsiChess X                    : 2166  201 278     9    38.9 %   2245   33.3 %
>>>>>>>14 Tao                           : 2165  229 291     9    44.4 %   2203   22.2 %
>>>>>>>15 Ruy Lopez                     : 2118  366 266     9    33.3 %   2238    0.0 %
>>>>>>>16 Pharaon                       : 2082  169 266     9    33.3 %   2202   44.4 %
>>>>>>>17 SpiderGirl                    : 2014  213 255     9    27.8 %   2180   33.3 %
>>>>>>>18 XiNiX                         : 1724  400 108     9     5.6 %   2216   11.1 %
>>>>>>>
>>>>>>>congratulation also for the Deep Junior team for winning the event convincingly
>>>>>>>when the difference from the second place is almost 200 elo and the hardware
>>>>>>>explain less than 70 elo difference.
>>>>>>>
>>>>>>>Uri
>>>>>>
>>>>>>I can add that I think that it may be a better idea to use elostat to decide
>>>>>>about the world champion in the future.
>>>>>>
>>>>>>I know that a lot of people are going to disagree but it is my opinion.
>>>>>>I prefer a complicated method that does more justive and not a simple method.
>>>>>>
>>>>>>Uri
>>>>>
>>>>>
>>>>>Sorry Uri - but this is really nonsens.
>>>>>You cant use ELO-Stat on a Swiss Tournament with 9 rounds as
>>>>>it is described by the author. ELO-Stat is designed to calculate
>>>>>ratings out of a pool of unknown rated progs with a very very lot
>>>>>of games.
>>>>>Therefor if you take a closer look at your table you would see that
>>>>>the error margin is at least 435!pts (Pharaon) and max 632!! (RuyLopez).
>>>>>And would you really believe Parallel SOS to be at 2256? :))
>>>>
>>>>The question is not which program is better.
>>>>competitions of 9 rounds are not supposed to answer this question.
>>>>
>>>>The question is which program did better result.
>>>>The elostat answer this question better than the ranking
>>>
>>>You forget the tournament strategy. Many times, you can adjust the contempt
>>>because you know that a draw is extremely convenient or will give you the
>>>title right away. Not to mention the selection of more or less agressive opening
>>>books for a special round. Sometimes, a draw is the same as a loss and you risk.
>>>That throws away any significance of a performance ELO in a 9 round tournament.
>>>This also applies for any human tournament.
>>>
>>>You can also have the weird situation where you got 8.5/9 and the one with 8/9
>>>has a better elo performance. They drew each other but a couple of opponents
>>>that play the 1st started to crash many games aftewards because of late minute
>>>changes in the code etc. That was totally out of control of the winner.
>>
>>
>>I think that it is not logical
>>If you get 8.5/9 your results are not worse than a player who got 8/9 and drew
>>against you.
>>
>>We look for a stable rating
>>Suppose that you got 8.5/9
>>Suppose that the rating of the player you drew is better than your rating
>>I can prove that your rating is not stable and is going to get bigger after the
>>tournament.
>>
>>you do not lose rating from winning 8 games and the rating of the opponents is
>>not important.
>>you win rating from drawing one game against a player with better elo rating so
>>the total result is that you earn rating.
>>
>>
>>If the elostat let situation when 8/9 is better than 8.5/9 including a draw
>>between the 2 best players then something is wrong with the elostat program.
>>
>>Uri
>
>Hi Uri,
>plese try the following experiment with elostat.
>1. Players A, B, and C play each other, with the following individual results:
>A beats B 99.5 to 0.5
>B beats C 99.5 to 0.5
>A beats C 100 to 0
>Which ratings do you get for A, B and C using Elostat?
>
>2. The same players, but with the following results:
>A beats B 99.5 to 0.5
>B beats C 99.5 to 0.5
>Same question as for part 1.
>
>If the program behaves correctly, the rating of A for part 1 should not be lower
>as the rating of A for part 2.
>José.

Unfortunately the program needs pgn and it calculate the results unless it is a
competition by 2 players.

Here is some information from the readme file of this program

Following this theory, the Elo rating corresponding to a relative performance of
100 % or 0 % is indefinite. Due to mathematical reasons (e.g. to guarantee the
feasibility of the iteration procedure) ELOStat assigns to those programs a
finite Elo value which is exactly 600 points smaller (0 % perf.) or greater (100
% perf.) than the Av.Op. Elo. Or in other words: ELOStat does not support Elo
differences greater than  600 points (therefore the 95% error margins
can be at most  1200 points). For nearly all practical purposes, this
restriction does not play an important role.

In very rare cases ELOStat produces an error message stating that the iteration
procedure failed and that no convergence of the Elo mean value could have been
reached within the maximum number of iterations specified by the program. This
problem only appears when many programs in the database are characterized by 0 %
or 100 % results. In these cases the iteration procedure is slowed down
significantly so that the Elo calculation takes a much longer time as usual.

Uri



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.