Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: 4 games! and my comment to these "games"

Author: KarinsDad

Date: 10:19:00 09/03/99

Go up one level in this thread


On September 03, 1999 at 09:22:47, Thorsten Czub wrote:

[snip]
>
>>5) These tests of Harald's do not prove ANYTHING about chess program strength.
>>You and I are on agreement on that. However, that does not mean that you cannot
>>glean information from this type of test. Think about it.
>
>you can get information from it. the information that it is nonsense to test
>a broken program. and the information that somebody who is testing a broken
>program, he is ot allowed to download, is doing illegal and senseless job.

I agree with you here. Remember I posted (as per my previous message) that you
posted that the program was "broken" after I posted my original message on this.

However, if someone takes an "unbroken" version of CSTal and runs a series of
tests at exactly 3 minutes per move per side versus other programs with
pondering turned off, then you can gain some information. Note: by series of
games, I am referring to running at least a hundred games between each set of
opponents in order to actually gain any reliable information. One cannot gain
any real information from the handful of games posted in this thread.

This type of test is NOT nonsense (you really should lose this word since it is
relatively derogatory without being informative), but it is non-standard. Also,
it is no manner a test of engine strength at standard tournament conditions.

By comparing this type of test to one where the exact 3 minutes per move is not
a restriction, one can gain, for example, general information on whether the
time management software for Hiarcs helps more or less versus the time
management software for CSTal.

And, of course, other tests can be performed such as allowing pondering, testing
for lesser or greater amounts of time, testing different parameters to each
program, etc. However, it is important to note that this HAS to be done in a
controlled manner and one has to be careful as to the conclusions that are made
with regard to the tests.

Granted, Harald's test MERELY showed that this particular version of CSTal
(broken or otherwise) happened to be beaten multiple times in a small subset of
games. It did not illustrate anything else such as standard tournament engine
strength. I am in agreement with you on that.

But, do not discard this type of test as not being viable just because this
example may not have been. Testing different things (and not just engine
strength) is how discoveries are made.

KarinsDad :)



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.