Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: What is the reason that the CEGT prefers testing CM personalities?

Author: Heinz van Kempen

Date: 03:04:02 10/18/05

Go up one level in this thread


Hi Uri,

in general I see no reason why CEGT has to justify here and I think you start
this because you are not happy with the start of the Fruit 2.2 Uri test. But
because you always gave a lot of feedback to our testing I will explain a bit.

<<I do not know if you are correct and I doubt if you have enough games against
different opponents to prove it(I explain later in this post why I doubt if it
can be correct).>>

For the CM settings I can only tell that you can ignore them if you do not like
them and doubt their worth. There are a lot of fans including the testers we
have for them and this is justification enough, even if results are not
convincing that settings are much better. Fun with experimenting here is a
factor.

<<Unfortunately CEGT are not very interesting in comparison between different
time
control and I see only one chessmaster in 4/40 time control so we even have no
evidence that all these personalities improve relative to the default when the
time control is 40/40 relative to 40/4.>>

CEGT is young. Not even one year old. The 40/4 games were started only about two
or three months ago. So what do you expect?


<<It is not related.

I also did not suggest that the CEGT will stop testing.
I did not claim that no testing is better than testing but only that I do not
understand the choice of the CEGT.>>

Of course the CEGT like the SSDF is free to test what they want and if the SSDF
will also prefer to test 10 different personalities of one program it is their
right and I will not suggest them to stop testing because of it.

It is not the first time that I do not understand the choice of CEGT.

I also did not understand the choice to do small number of blitz games relative
to long time control.>>

Any tester, anyone interested in engines would like to see other matches, other
conditions, other engines. This is normal and the most difficult thing in a team
for being agreed. There has to be unification, otherwise tests do not give
statistical reliable results with many games. And you are right, we are free to
do what was agreed in the team. This are all experienced testers and I am sure
that a lot of useful things will be done in the future, if not some people will
come and destroy all with unsound critics.

<<The choice of blitz of 4/40 also seemed to me not very good and I thought that
testers will prefer 2/40 for comparison with 40/40 but I read that some testers
even prefered slower time control in the blitz games that is simply against all
the idea of blitz games.

The idea of blitz games is to compare between long time control and blitz to see
if there are programs that are probably better in blitz.

It may be possible to try to speculate from it about longer time control.

As far as I know we usually see relatively small difference between 4/40 and
40/40 and it may suggest that the difference in time control should be more than
1:10 in order to see big difference so if there is no significant difference
between CM default and other CM personality at 40/40 then I do not think that
there is going to be a significant difference between CM default and other CM
personality in a slower time control by factor of 2 or 3.>>

Blitz has no priority in CEGT and is not accepted by many as a measurement. I
had even difficulties to start 40/4, because others wanted more time even for
Blitz. And as I said it is just started. We can also drop it again.

Best Regards
Heinz



Uri




This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.