Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: The big drop in the rating of my Fruit personality

Author: Heinz van Kempen

Date: 07:57:35 10/17/05

Go up one level in this thread


On October 17, 2005 at 10:10:58, Uri Blass wrote:

>I am now surprised by the big drop in the CEGT rating of my Fruit personality.
>
>It was already 2806 after 92 games and now it is 2748 after 223 games.
>
>I also remember possible error of 61 elo after 92 games but even if the real
>rating is 61 elo lower than 2806 then I still do not expect the rating to change
>so fast.
>
>This is surprising also because results that I read earlier not by CEGT
>supported my personality.
>
>I wonder if the real error is not higher than the error that is written
>
>I wonder what is the reason for the big drop and if there was no problem in the
>matches against spike and Jonny that seem to be the main reason for the drop in
>my personality(did the same tester play these matches?).
>
>possible source of mistakes in the results.
>
>1)testing in different hardware relative to previous fruit.
>
>The claim of the CEGT is that they test with hardware that is equivalent to 2
>ghz PIV but the problem is that there is no equivalence and it is possible that
>one program likes more one processor and not another processor.
>
>2)testing different positions and not the same positions that were tested by
>earlier version.
>
>3)testing against different opponents.
>
>Uri

Hi Uri,

okay we had the following....

after 51 games---2823 ELO
after 93 games---2806 ELO
after 130 games---2760 ELO
after 223 games---2747 ELO

One thing often happens with EloStat. In the beginning you get very high ratings
that in 90% of all cases cannot hold.

General opinion of CEGT testers is that most settings do not give the same good
results with longer time control than for Blitz. For Eccentric for example we
can´t reproduce good results completely, because we do not use a special book or
learning. Any test suite of around 100 Blitz games posted here can only be an
indication that it might be worth a try, but in many cases we will see again and
again that it will not hold with longer time controls. So for the moment CEGT
testers are getting a bit tired to test settings, except for Chessmaster where
we have improvement with some personalities.

<1.)testing in different hardware relative to previous fruit.

No, first games come mainly from Christian, Charles and me on the same hardware
we also used over the past months. Christian tested on Intel Celeron and Charles
and I on Athlon, so the same we ever had.

<2)testing different positions and not the same positions that were tested by
earlier version.>

same shorter books like Sedat´s Perfect books, 8 move, etc., also used for Fruit
default.

3)testing against different opponents

We started with some Fruit WCCC'05 also had and would have continued with all
others, but when rating will not improve until we have 500 games, I do not think
that CEGT testers will like to test more settings currently.

Okay, I understand that people are happy when they get good results based on a
positions test or 100 Blitz games, but the chances that it will hold also for
longer time control seem not to be good.

Jonny match is run by me. Standing currently: 17,5:10,5 for Fruit.
Spike match was run by Christian. He already resigned, as he generally does not
like personality testing and continued with Loop List matches.

Best Regards
Heinz



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.