Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Congratulation for chesstiger(better performance than shredder in wmccc)

Author: Jeff Lischer

Date: 09:09:53 08/24/01

Go up one level in this thread


On August 24, 2001 at 07:29:21, Günther Simon wrote:

>On August 24, 2001 at 07:15:30, Uri Blass wrote:
>
>>On August 24, 2001 at 07:06:51, Uri Blass wrote:
>>
>>>Here are the results by
>>>elostat program
>>>
>>>You can see that shredder is only 3th place micro based on the performance.
>>>Shredder is the world Micro champion by definition but Tiger and Rebel had a
>>>better performance.
>>>
>>>
>>>1 Deep Junior 7                  : 2745  228 281     9    88.9 %   2384   22.2 %
>>>2 Quest (DeepFritz)              : 2550  266 169     9    66.7 %   2430   44.4 %
>>>3 Chess Tiger 14.6 Gambit Tiger  : 2499  291 229     9    55.6 %   2461   22.2 %
>>>4 Crafty 18.10X                  : 2467  291 165     9    55.6 %   2428   44.4 %
>>>5 Rebel                          : 2466  291 229     9    55.6 %   2428   22.2 %
>>>6 Shredder                       : 2466  266 249     9    66.7 %   2346   22.2 %
>>>7 Goliath                        : 2421  291 165     9    55.6 %   2382   44.4 %
>>>8 Gromit 3.9.5                   : 2364  278 201     9    61.1 %   2285   33.3 %
>>>9 Ferret                         : 2359  291 229     9    55.6 %   2320   22.2
>>>%10 Gandalf 5.0                   : 2310  291 229     9    55.6 %   2271   22.2
>>>%
>>>11 ParSOS                        : 2256  291 229     9    55.6 %   2217   22.2 %
>>>12 Diep                          : 2227  165 291     9    44.4 %   2265   44.4 %
>>>13 IsiChess X                    : 2166  201 278     9    38.9 %   2245   33.3 %
>>>14 Tao                           : 2165  229 291     9    44.4 %   2203   22.2 %
>>>15 Ruy Lopez                     : 2118  366 266     9    33.3 %   2238    0.0 %
>>>16 Pharaon                       : 2082  169 266     9    33.3 %   2202   44.4 %
>>>17 SpiderGirl                    : 2014  213 255     9    27.8 %   2180   33.3 %
>>>18 XiNiX                         : 1724  400 108     9     5.6 %   2216   11.1 %
>>>
>>>congratulation also for the Deep Junior team for winning the event convincingly
>>>when the difference from the second place is almost 200 elo and the hardware
>>>explain less than 70 elo difference.
>>>
>>>Uri
>>
>>I can add that I think that it may be a better idea to use elostat to decide
>>about the world champion in the future.
>>
>>I know that a lot of people are going to disagree but it is my opinion.
>>I prefer a complicated method that does more justive and not a simple method.
>>
>>Uri
>
>
>Sorry Uri - but this is really nonsens.
>You cant use ELO-Stat on a Swiss Tournament with 9 rounds as
>it is described by the author. ELO-Stat is designed to calculate
>ratings out of a pool of unknown rated progs with a very very lot
>of games.
>Therefor if you take a closer look at your table you would see that
>the error margin is at least 435!pts (Pharaon) and max 632!! (RuyLopez).
>And would you really believe Parallel SOS to be at 2256? :))
>You must be a strong Tigerfan to post this very unlike post, as it
>is diametral to all your previous posts about stats?!
>(Btw hasnt Shredder won against Tiger or am I out of memory?)
>At least I want to mention that I have wished a better reult for the
>Tiger too - may be the style of pure CT would have done better than
>a mixup with the GT?
>
>Günther

What a coincidence! Last night I had typed in and was about to submit a message
to the board very similar to Uri's above. I had also run the pgn file through
ELOStat 1.1b (also with an average Elo of 2300) and had seen that Tiger's
performance was better than Shredder's. But then I hesitated to send the message
for the exact concerns mentioned by Günther.

I decided instead that today I would ask the CCC board whether it is really
correct using ELOStat to determine performance ratings from a Swiss tournament.
I agree that performance would be a better measure than just looking at the raw
score especially if some players play against tougher competition and others
play on average much easier competition. The rules of Swiss pairing try to
minimize this problem, but they aren't perfect as seen by this tournament's
results.

However, normally when I've seen performance ratings discussed, it is from human
tournaments where the actual ratings of the players are well known. Then it
makes some sense to calculate a player's "performance" based on how well they
played against those established ratings.

It seems there is a difference between trying to estimate Elo ratings versus
estimating performance ratings. As Günther points out, to estimate Elo ratings
you need a lot of games to have confidence in the predictions. ELOStat will
estimate ratings based on no known data on the players, but if there are only a
few games, the uncertainties will be large.

To estimate performance, on the other hand, you need established Elo ratings for
all players so you can benchmark performances against those ratings. In the case
of this tournament, we don't have either of these required conditions.

I'm very curious to hear what others think about estimating performance from
tournament results like this?



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.