Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Chessfun and Nunn1 Tests

Author: Mogens Larsen

Date: 04:15:14 05/10/00

Go up one level in this thread


On May 09, 2000 at 21:57:02, Eelco de Groot wrote:

>
>Hello Mogens,

Hello Eelco,

>I don't want to restart this whole dicussion and I haven't followed more than
>half of it but I think some of your criticisms were a little harsh.

That is probably true. But if you read some of my posts you'll see that a lot of
the so called critiscism is formulated as questions. Most of the questions were
either adressed superficially or not at all by the tester. That made my remarks
and questions more rash and harsh than intended, which is indeed unfortunate. If
feelings got trampled on I'm truly sorry, but I believe that most of my remarks
were justified. I also believe that the study could have been conducted much
better considering the unique hardware available for Chessfun.

>In practically any experiment there are disturbing influences and I think there
>were some here too. The biggest influence I could see, one that possibly could
>have been avoided is that in the beginning some matches were played with
>booklearning on. If I am mistaken here I hope that somebody can correct me. I
>know for Rebel that booklearning can be disabled, for Crafty this can be done
>with the command learn=0. I don't know exactly if those commands can be used in
>the Hiarcs interface or in winboard for Crafty and if booklearning can be
>disabled for Fritz 6a too but especially for a repeated Nunn-like test it would
>be desirable. Okay, I think that is clear.

There are more questions of a similar nature, but they were not adressed as
well. Autoplayer is the main culprit in my opinion, especially if you want to
compare ponder on with ponder off.

>Apart from that I think using the Nunn positions was a good idea from Chessfun,
>if the object was to see how a. timecontrol or b. pondering on one or two
>machines affects the strength of an engine combined with use of the timing
>algorithms involved. I think any not too imbalanced early middlegame position
>could be used for these experiments if each engine gets to play both colours. In
>practice of course also the opening books affect the strength of a program (as
>opposed to engine) but since bookmoves can be played very fast just starting
>from a Nunn-position does not make much difference for the timing algorithm. The
>big down side I see in using books and learners is that the books also have a
>big randomizing effect on the results and secondly if the two learners in a
>match don't cancel each other out that can mean that the results don't stabilize
>even after large numbers of games. They are just big noise generators if you
>want to look at the effect of pondering or timecontrol. Even if you would
>consider both books and both learners equally good you need much more games to
>determine differences in engine strength this way.

Nunn positions are okay for testing I guess, but can't be used for accurate
strength assessment IMO. They might favor one engine over another or they might
not. A common book, or a special book for each engine, is to be preferred, since
it's the strength of the program we're interested in, which is why learning
should be included as well. Nunn positions aren't the ten commandments, nor are
they written in stone :o). Try looking at the Nunn tests at Chessfuns site and
reassure me that they are better.

>Comparing tests on one machine with results on two machines is interesting too
>to see if there is a discernible effect of how limited resources get divided
>etc.

That's true if you got control over the other parameters. I'm not entirely sure
if that was the case.

>There could have been some influence from using other programs on just the
>Windows 98 computer but I don't think that can have had much influence on the
>results. I think it is easy to test how much a program gets slowed down if you
>use it with other programs running, by looking at times needed to reach a
>certain plydepth.

I don't think it's that important either, assuming that Chessfuns measurements
are correct, but they should have been mentioned and their relevance examined.

I'm a little disappointed that the testing stopped and I think I'm to blame for
that. So if Chessfun decides to resume her testing, I promise not to make
remarks about their validity.

Sincerely,
Mogens



This page took 0.01 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.