Author: Heinz van Kempen
Date: 13:32:09 12/14/05
Go up one level in this thread
On December 14, 2005 at 15:46:06, Uri Blass wrote: >On December 14, 2005 at 11:47:55, Wilhelm Hudetz wrote: > >>Hi all, >> >>CEGT 40/40 rating lists and downloads are updated. >> >>Rybka with different setting (very tactical) and more games now close to default >>version. Also 204 games for the 64bit version in the all version rating table. >> >>http://kd.lab.nig.ac.jp/chess/cegt/ (BayesELO, download, additional stats) >> >>http://kd.lab.nig.ac.jp/chess/cegt.0/ (previous version) >> >>http://www.husvankempen.de/nunn/rating.htm (EloStat and indivudual performance) >> >>Best regards >>Wilhelm > >I am disappointed that all versions does not include all versions. > >Gambit fruit had one game in the previous list and the first interesting >information was to see the place of Gambit Fruit. > >First I thought that Gambit Fruit dropped in the list maybe thank to some bad >results in more few games that it played but then I discovered that latest movei >is also not in the list. > >I guess that not including all versions is because of the comment of Eduerd >Nemeth who complained about the fact that Gambit Fruit appeared in all versions >and it seems that his comment was counter productive. > >I prefer to see rating list with all versions even if some version has only one >single game. > >I do not jump to conclusions based on rating list about which version is better >and I only prefer to see more information and not less information. > >You can also decide to drop out of the list programs with less than x games for >some x because they may change the rating of other programs but even in this >case it is going to be nice to have a special table of performance of programs >with less than x games. > >Note that if you do not include programs with less than x games for the rating >list for some x then the number of program with less than x games may increase >and you need to repeat the process until no program has less than x games. > >I do not know if not including programs with less than x games for some x can >help to have better rating list for the programs with at least x games. > >disatvantage is of course that the list is going to be based on less games but >the advantage is that programs with little number of games cannot distort the >rating of it's opponents. > >Uri Hi Uri, apparently we can do what we want. When we have an engine ranked on position 2 with one game people come and tell that this must be a joke. On the other hand there is no real need for this, as we can give quickly let us say at least 30 games to Gambit Fruit against different opponents. What concerns us more is that there may be distortions from other things and this is not only valid for CEGT, but for all rating lists. An experiment showed when taking out for example all other versions of Fruit and Toga except Fruit 2.2 and 2.2.1 combined, then rating for Fritz 9 goes up by not less than 14 ELO. This is mainly because of the catastrophical result Fritz 9 had against Toga 1.0, not only for us, but also for Kurt for example. So I think we can at least offer one "undistorted" rating list once a month including only the best version. It may happen that the distortions with many games again are balanced, because those doing bad against many Fruit and Toga versions, maybe do better against the Chessmasters or against many Fritz, Shredder, Junior versions and so on. But an engine having just three or four out of these as Nemesis opponent is of course punished multiple times. The whole complicated rating stuff is currently be discussed in CEGT forum. Some are not happy with BayesELO only, others claim that EloStat gives not so reliable values and so on. On the other hand it is not our goal to have ten different rating lists. It is anyway not easy and one has to ask questions about any rating list including SSDF. Best Regards Heinz
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.