Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: The big drop in the rating of my Fruit personality

Author: Roger D Davis

Date: 10:37:02 10/17/05

Go up one level in this thread


On October 17, 2005 at 10:57:35, Heinz van Kempen wrote:

>On October 17, 2005 at 10:10:58, Uri Blass wrote:
>
>>I am now surprised by the big drop in the CEGT rating of my Fruit personality.
>>
>>It was already 2806 after 92 games and now it is 2748 after 223 games.
>>
>>I also remember possible error of 61 elo after 92 games but even if the real
>>rating is 61 elo lower than 2806 then I still do not expect the rating to change
>>so fast.
>>
>>This is surprising also because results that I read earlier not by CEGT
>>supported my personality.
>>
>>I wonder if the real error is not higher than the error that is written
>>
>>I wonder what is the reason for the big drop and if there was no problem in the
>>matches against spike and Jonny that seem to be the main reason for the drop in
>>my personality(did the same tester play these matches?).
>>
>>possible source of mistakes in the results.
>>
>>1)testing in different hardware relative to previous fruit.
>>
>>The claim of the CEGT is that they test with hardware that is equivalent to 2
>>ghz PIV but the problem is that there is no equivalence and it is possible that
>>one program likes more one processor and not another processor.
>>
>>2)testing different positions and not the same positions that were tested by
>>earlier version.
>>
>>3)testing against different opponents.
>>
>>Uri
>
>Hi Uri,
>
>okay we had the following....
>
>after 51 games---2823 ELO
>after 93 games---2806 ELO
>after 130 games---2760 ELO
>after 223 games---2747 ELO
>
>One thing often happens with EloStat. In the beginning you get very high ratings
>that in 90% of all cases cannot hold.
>
>General opinion of CEGT testers is that most settings do not give the same good
>results with longer time control than for Blitz. For Eccentric for example we
>can´t reproduce good results completely, because we do not use a special book or
>learning. Any test suite of around 100 Blitz games posted here can only be an
>indication that it might be worth a try, but in many cases we will see again and
>again that it will not hold with longer time controls. So for the moment CEGT
>testers are getting a bit tired to test settings, except for Chessmaster where
>we have improvement with some personalities.
>
><1.)testing in different hardware relative to previous fruit.
>
>No, first games come mainly from Christian, Charles and me on the same hardware
>we also used over the past months. Christian tested on Intel Celeron and Charles
>and I on Athlon, so the same we ever had.
>
><2)testing different positions and not the same positions that were tested by
>earlier version.>
>
>same shorter books like Sedat´s Perfect books, 8 move, etc., also used for Fruit
>default.
>
>3)testing against different opponents
>
>We started with some Fruit WCCC'05 also had and would have continued with all
>others, but when rating will not improve until we have 500 games, I do not think
>that CEGT testers will like to test more settings currently.
>
>Okay, I understand that people are happy when they get good results based on a
>positions test or 100 Blitz games, but the chances that it will hold also for
>longer time control seem not to be good.
>
>Jonny match is run by me. Standing currently: 17,5:10,5 for Fruit.
>Spike match was run by Christian. He already resigned, as he generally does not
>like personality testing and continued with Loop List matches.
>
>Best Regards
>Heinz


"One thing often happens with EloStat. In the beginning you get very high
ratings that in 90% of all cases cannot hold."
----------------

This seems to say that the first batch of games is somehow different from
subsequent batches, based on how ELOstat calculates.

Say I run 200 games. I randomly sample 100 games from the 200, creating two
batches, then calculate separate ELOs.

Are you saying that the ELOs for the batches of 100 taken separately will almost
always be higher than the 200 taken together?

The games have already been played...the opponent's rating and the outcome of
the games are constant.

Roger




This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.