Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: " C E G T - 40/40 Rating List " Update December 14.

Author: Ryan B.

Date: 16:32:20 12/14/05

Go up one level in this thread


On December 14, 2005 at 16:32:09, Heinz van Kempen wrote:

>On December 14, 2005 at 15:46:06, Uri Blass wrote:
>
>>On December 14, 2005 at 11:47:55, Wilhelm Hudetz wrote:
>>
>>>Hi all,
>>>
>>>CEGT 40/40 rating lists and downloads are updated.
>>>
>>>Rybka with different setting (very tactical) and more games now close to default
>>>version. Also 204 games for the 64bit version in the all version rating table.
>>>
>>>http://kd.lab.nig.ac.jp/chess/cegt/ (BayesELO, download, additional stats)
>>>
>>>http://kd.lab.nig.ac.jp/chess/cegt.0/ (previous version)
>>>
>>>http://www.husvankempen.de/nunn/rating.htm (EloStat and indivudual performance)
>>>
>>>Best regards
>>>Wilhelm
>>
>>I am disappointed that all versions does not include all versions.
>>
>>Gambit fruit had one game in the previous list and the first interesting
>>information was to see the place of Gambit Fruit.
>>
>>First I thought that Gambit Fruit dropped in the list maybe thank to some bad
>>results in more few games that it played but then I discovered that latest movei
>>is also not in the list.
>>
>>I guess that not including all versions is because of the comment of Eduerd
>>Nemeth who complained about the fact that Gambit Fruit appeared in all versions
>>and it seems that his comment was counter productive.
>>
>>I prefer to see rating list with all versions even if some version has only one
>>single game.
>>
>>I do not jump to conclusions based on rating list about which version is better
>>and I only prefer to see more information and not less information.
>>
>>You can also decide to drop out of the list programs with less than x games for
>>some x because they may change the rating of other programs but even in this
>>case it is going to be nice to have a special table of performance of programs
>>with less than x games.
>>
>>Note that if you do not include programs with less than x games for the rating
>>list for some x then the number of program with less than x games may increase
>>and you need to repeat the process until no program has less than x games.
>>
>>I do not know if not including programs with less than x games for some x can
>>help to have better rating list for the programs with at least x games.
>>
>>disatvantage is of course that the list is going to be based on less games but
>>the advantage is that programs with little number of games cannot distort the
>>rating of it's opponents.
>>
>>Uri
>
>Hi Uri,
>
>apparently we can do what we want. When we have an engine ranked on position 2
>with one game people come and tell that this must be a joke. On the other hand
>there is no real need for this, as we can give quickly let us say at least 30
>games to Gambit Fruit against different opponents.
>
>What concerns us more is that there may be distortions from other things and
>this is not only valid for CEGT, but for all rating lists. An experiment showed
>when taking out for example all other versions of Fruit and Toga except Fruit
>2.2 and 2.2.1 combined, then rating for Fritz 9 goes up by not less than 14 ELO.
>This is mainly because of the catastrophical result Fritz 9 had against Toga
>1.0, not only for us, but also for Kurt for example. So I think we can at least
>offer one "undistorted" rating list once a month including only the best
>version. It may happen that the distortions with many games again are balanced,
>because those doing bad against many Fruit and Toga versions, maybe do better
>against the Chessmasters or against many Fritz, Shredder, Junior versions and so
>on. But an engine having just three or four out of these as Nemesis opponent is
>of course punished multiple times.
>
>The whole complicated rating stuff is currently be discussed in CEGT forum. Some
>are not happy with BayesELO only, others claim that EloStat gives not so
>reliable values and so on. On the other hand it is not our goal to have ten
>different rating lists. It is anyway not easy and one has to ask questions about
>any rating list including SSDF.
>
>Best Regards
>Heinz


Well Eduerd Nemeth simply does not like Gambit Fruit and would claim the results
to be flawed even if it had over 1000 games if it was listed above Toga.  The
style of Gambit Fruit is different enough that it often players good or bad
against different opponents than Toga does.  For example Toga does very well
against Fritz 9 and Scorpio but Gambit Fruit plays its worst against these
opponents.  I think not testing Gambit Fruit can also give a distorted view of
where its ranking is.  Some may think it is better than it is some think worse.
Not that it really matters as there seams to be a popular myth that Toga 1.1 is
somehow stronger than Fruit 2.2 despite Fruit 2.2 being rated higher on the 2
most credible rating lists I know of.  Even solid statistics from a reliable
ratings list can not sway perception closer to truth some times.
Ryan



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.