Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: CEGT: testing and presentation of results

Author: Uri Blass

Date: 22:53:14 10/19/05

Go up one level in this thread


On October 20, 2005 at 01:23:10, Kurt Utzinger wrote:

>On October 20, 2005 at 00:04:43, Heinz van Kempen wrote:
>
>>Hi all :-),
>>
>>currently a few CEGT testers are disappointed about what was written here in
>>some threads.
>
>      I fully understand them. Most people who do criticise
>      have never thought about what real testing means :-)
>      Kurt
>
>>
>>Feeling not responsible for statistically absurd fluctuations we really think
>>that it is better to give rating lists from Thursday evening onwards only for
>>engine versions, settings, etc. with more than 300 games. It simply leads to
>>disappointment and frustration, when there is a fantastic start and afterwards a
>>cruel drop, like it happens so many times over and over again. Maybe better to
>>inform the authors about progress of a big test via email or in our private
>>forum.
>
>     A good idea. It's simply true that no conclusions whatsoever
>     should be made about an engine before 300-500 games have been
>     played.

I know that there is statistical error but it seems that the possible error is
bigger because of possible hardware problems.

I think that it may be better to have testing with less games but when there is
verification of no error.

This is not something specific against the CEGT and all tests do not run with
verification for errors.

It is annoying to see games without the possibility to verify if there is
something wrong in the game because even if you know the time control the
program may not search again the same number of nodes if you repeat the game and
the fact that there is a bug that you cannot reproduce does not prove that there
was a mistake in the game.

I think that programmers also should work for verification of no errors
One idea that I can think about is that the programmers will write the exact
number of nodes and the times to a logfile and will have a function to verify
games based on pgn that include exact number of nodes for every move and exact
time.

Note that Movei write exact number of nodes that was searched to a logfile but
has no function to verify games based on pgn and number of nodes and times(it is
logical to have in the verification process sligthly different time for the same
number of nodes but not significantly different time).

Uri



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.