Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: The big drop in the rating of my Fruit personality

Author: Heinz van Kempen

Date: 02:42:45 10/18/05

Go up one level in this thread


On October 18, 2005 at 02:09:42, billiau wrote:

>Hi,
>
>I remarked Fruit 2.2 standard did not play against the strong Deep Fritz8.
>
>I agree, we need a lot of games against a lot of opponents due to the way CEGT
>do the tests (differents opponents, differents hardwares...).
>
>I am a bit surprised by the Spike match result (compared with blitz results).
>The other ones does not seem so bad for the moment.
>
>I think it's too early to reject this setting.
>This setting should not be considered like the others ones.
>We only changed the history pruning threshold (thats' all).
>
>I know it's a lot of work to test these programs.
>Please, continue this good work to be sure we don't lose something great.
>
>G. Billiau

Hi,

you are right, much too early to take conclusions and it is done again and
again.

Even worse, when there are surprising results there are insinuations that the
test conditions might be flawed, that something is wrong when there isnĀ“t.
People just do not really understand that many games are needed.

We have another example. I continued the Spike match and this time Fruit 2.2 Uri
is leading by 7,5 to 0,5. No use to ask why, it just happens. This are usual
statistical fluctuations. Run ten matches between Spike and Fruit and you will
get all kinds of results. So easy.

Best Regards
Heinz



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.