Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Surak's Tourney: Differences EloStat vs. Fritz calculation

Author: Axel Schumacher

Date: 08:09:43 08/08/03

Go up one level in this thread


On August 08, 2003 at 03:16:47, Günther Simon wrote:

>On August 08, 2003 at 02:04:48, Axel Schumacher wrote:
>
>>Example:
>>EloStat, some of the lower ranked engines:
>>728 Yawce 0.16                     : 1962   58  33   263    31.9 %   2094    6.8
>>%
>>733 Raffaela                       : 1951   78  46   130    30.0 %   2098   12.3
>>%
>>736 Nero 5.3                       : 1934  114  60    81    30.2 %   2079    1.2
>>%
>>755 Pierre 1.7                     : 1861   60  30   290    30.2 %   2007    3.8
>>%
>>773 ROBOKewlper 0.047              : 1778  143  55    71    15.5 %   2073   14.1
>>%
>>775 Bigbook 3.1                    : 1765   48  24   443    28.0 %   1929    9.5
>>%
>>781 König Schwarz                  : 1717   53  42   182    36.0 %   1817   20.3
>>%
>>787 Kace 0.8                       : 1643  123  75    47    22.3 %   1860   23.4
>>%
>>
>>and the same with Fritz (even much higher values):
>>
>>	Yawce 0.16	2080	262
>>	Nero 5.3	2073	79
>>	Raffaela	2064	130
>>	Pierre 1.7	1983	288
>>	Bigbook 3.1	1902	441
>>	ROBOKewlper 0.047	1898	69
>>	König Schwarz	1880	182
>>	Kace 0.8	1805	47
>>
>>
>>Axel
>
>I dont know how your tournaments are structured, but you should
>take care about having pools of players which are in a not to
>distant range of Elo.
>You should consider to make leagues or do some swiss tourneys.
>
>What would happen, if you won't calculate all games between
>players which differ by more than 400 Elo?
>
>From the above ratings I can give you an example, why it
>does not work as it should, even with EloStat.
>I can see that Raffaela has a 30% score and a rating of 1951,
>(in reality it is hardly over 1500)
>imagine Raffaela had played 70x versus Fritz 8 and 30x
>against Kace (assume it wins all versus Kace(what I doubt)
>and loses all games versus Fritz8), it would get a highly
>inflated rating, which would influence also all other
>(in reality) weak opponents of Raffaela etc...

You're right. However, Raffaela certainly didn't played against Fritz more than
once. Usually, after a gauntlet against some engines, in my tourney each new
engine plays most games against other engines which Elo is in the same range
(+/- 100; mostly small sub-swiss tourneys). I agree with Uri, that we may need a
new way to calculate these data. Maybe we also should not try to compare the
absolute Elo-values with Elo-rankings we know from human games, unless a
substantial part of the computergames were played against humans.
It seems I have to pay more games against theses engines :-)

Regards
Axel

>
>Regards,
>Günther



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.