Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: CEGT: testing and presentation of results

Author: Uri Blass

Date: 13:17:09 10/20/05

Go up one level in this thread


On October 20, 2005 at 10:05:09, Heinz van Kempen wrote:

>Hi David,
>
>yes, some people like you indeed understand weird statistics and also that the
>output from EloStat for versions with few games is not the best, but the
>majority does not.
>
>An engine starting like the new superstar and then dropping quickly like a stone
>just afterwards, this just gives the impression to most that testers either told
>lies or did something wronng.
>
>Best Regards
>Heinz

Some conmments:
1)I never claimed that testers told lies.

2)problems happen with hardware and I did not claim that only the CEGT may have
problems that are not results of statistical errors.

3)If an engine starts like a superstar and drop like a stone or the opposite it
increase the probability that something is wrong with the results.

It does not mean that something is wrong in the last games and it is also
possible that something was wrong in the first games.

The question is only what is the result that justify checking if there is
problem.



Let imagine some extreme cases that never happened

a)if an engine scores 100% in the first 50 games against average rating of 2650
and 0% against average rating of 2650 in the next 50 games then you can be
practically sure that there is a problem in testing and the result is wrong.

b)if an engine scores 99% in the first 50 games against average rating of 2650
and 1% in the next 50 games against average rating of 2650 then you can also be
practically sure that there is a problem and the result is wrong.



What happened was of course less extreme relative to a and b but please
understand that what happened was enough to increase the suspect that something
is wrong.
The question is how much suspicion that something is wrong suggest checking the
games to find if something is wrong.

I do not suspect that something may be wrong only in testing of other people but
also in my testing in the time that I did some tests with movei versions.

I remember that in the past there was some strange results that I got in testing
2 versions of Movei in very fast time control.

I suspected that something is wrong and I was right.
It turned out that accidentally both versions did not use the same hash and
clearing hash tables between moves caused the version with more hash to lose
most of the games.

Suspecting that something is wrong is normal behaviour and it is not a personal
attack.
I can only be sorry that people see it as personal against them.

Uri



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.