Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: The big drop in the rating of my Fruit personality

Author: Uri Blass

Date: 03:01:19 10/18/05

Go up one level in this thread


On October 18, 2005 at 05:42:45, Heinz van Kempen wrote:

>On October 18, 2005 at 02:09:42, billiau wrote:
>
>>Hi,
>>
>>I remarked Fruit 2.2 standard did not play against the strong Deep Fritz8.
>>
>>I agree, we need a lot of games against a lot of opponents due to the way CEGT
>>do the tests (differents opponents, differents hardwares...).
>>
>>I am a bit surprised by the Spike match result (compared with blitz results).
>>The other ones does not seem so bad for the moment.
>>
>>I think it's too early to reject this setting.
>>This setting should not be considered like the others ones.
>>We only changed the history pruning threshold (thats' all).
>>
>>I know it's a lot of work to test these programs.
>>Please, continue this good work to be sure we don't lose something great.
>>
>>G. Billiau
>
>Hi,
>
>you are right, much too early to take conclusions and it is done again and
>again.
>
>Even worse, when there are surprising results there are insinuations that the
>test conditions might be flawed, that something is wrong when there isnĀ“t.
>People just do not really understand that many games are needed.

I understand it but it does not mean that I cannot suspect that there may be a
problem.

The problem may be also in the first results and I do not claim that my
personality has to be stronger.

I know from experience that at least in one case with the ssdf results I was
right and you should not take it personal(it was not personal attack against the
testers who do a good work).


>
>We have another example. I continued the Spike match and this time Fruit 2.2 Uri
>is leading by 7,5 to 0,5. No use to ask why, it just happens. This are usual
>statistical fluctuations. Run ten matches between Spike and Fruit and you will
>get all kinds of results. So easy.

I do not accept that there is no reason to ask why

There must be a reason for it and I guess that it may be interesting to find the
reason.

The reason does not have to be mistakes in testing but there must be a reason.

Examples for possible reasons:

1)in the first match there was opening that fruit did not like
2)in the first match there was hardware that fruit did not like relative to
spike because we know that not all testers use the same hardware and I remember
that the first match was done by a tester who decided later to stop testing.

Best Regards,
Uri



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.