Computer Chess Club Archives

Search

Terms
Messages

Subject: Re: Surak's Tourney: Differences EloStat vs. Fritz calculation

Author: Uri Blass
Date: 00:52:19 08/08/03
On August 08, 2003 at 03:16:47, Günther Simon wrote:

>On August 08, 2003 at 02:04:48, Axel Schumacher wrote:
>
>>Example:
>>EloStat, some of the lower ranked engines:
>>728 Yawce 0.16                     : 1962   58  33   263    31.9 %   2094    6.8
>>%
>>733 Raffaela                       : 1951   78  46   130    30.0 %   2098   12.3
>>%
>>736 Nero 5.3                       : 1934  114  60    81    30.2 %   2079    1.2
>>%
>>755 Pierre 1.7                     : 1861   60  30   290    30.2 %   2007    3.8
>>%
>>773 ROBOKewlper 0.047              : 1778  143  55    71    15.5 %   2073   14.1
>>%
>>775 Bigbook 3.1                    : 1765   48  24   443    28.0 %   1929    9.5
>>%
>>781 König Schwarz                  : 1717   53  42   182    36.0 %   1817   20.3
>>%
>>787 Kace 0.8                       : 1643  123  75    47    22.3 %   1860   23.4
>>%
>>
>>and the same with Fritz (even much higher values):
>>
>>	Yawce 0.16	2080	262
>>	Nero 5.3	2073	79
>>	Raffaela	2064	130
>>	Pierre 1.7	1983	288
>>	Bigbook 3.1	1902	441
>>	ROBOKewlper 0.047	1898	69
>>	König Schwarz	1880	182
>>	Kace 0.8	1805	47
>>
>>
>>Axel
>
>I dont know how your tournaments are structured, but you should
>take care about having pools of players which are in a not to
>distant range of Elo.
>You should consider to make leagues or do some swiss tourneys.
>
>What would happen, if you won't calculate all games between
>players which differ by more than 400 Elo?
>
>From the above ratings I can give you an example, why it
>does not work as it should, even with EloStat.
>I can see that Raffaela has a 30% score and a rating of 1951,
>(in reality it is hardly over 1500)
>imagine Raffaela had played 70x versus Fritz 8 and 30x
>against Kace (assume it wins all versus Kace(what I doubt)
>and loses all games versus Fritz8), it would get a highly
>inflated rating, which would influence also all other
>(in reality) weak opponents of Raffaela etc...
>
>Regards,
>Günther

I think that Elostat and Fritz should not be used for rating and it is better to
have no rating and not to use these programs.

The rating of a program should not be based on result and average of its
opponents.

There should be expected result for every rating difference and the way to
calculate rating is to start with equal rating for everybody and in every step
reduce the rating of programs that score less than expected and increase the
rating of programs who do more than expected.

The expected result should be calculated for every game and in the first
iteration it is a draw.

I think that a simple algorithm that says to reduce the rating of every program
that scores more than expected by 1/(n^0.5) elo and increase the rating of every
program that score more than expected by 1/(n^0.5) elo in iteration n is good
enough if we do 100,000,000 iterations.

There may be faster ways but it is not very important when computers are fast
and can do millions of iteration in a short time(I first thought about 1/n but
1/n is not good enough because a program need near e^100 iteration to get 100
elo so I decided about the square root of n).

Uri
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.