Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Chessfun and Nunn1 Tests

Author: Chessfun

Date: 21:59:13 05/10/00

Go up one level in this thread


On May 10, 2000 at 19:43:52, Eelco de Groot wrote:

>
>
> Why the Nunn test
>
>>A common book, or a special book for each engine, is to be preferred, since
>>it's the strength of the program we're interested in,
>
>That was just exactly my argument, the strength of the program as a whole (for
>the sake of simplicity I'll consider that now as a combination of engine+timing
>algorithm+book+learner) was not really what Chessfun and Christophe a.o. were
>interested in.


Initially the interest was in the result Jouni posted, Crafty 17-10
beating F6a. I asked for the scores and the parameters none were forthcoming.
Though it is known that it was with ponder=off.
So I decided to run the games myself since I do NOT believe Crafty 17-10 can
beat Fritz 6a at Nunn 1. No questions were asked about these results of Jouni's
by others or parameters as ensued when my results were posted.

This interest developed into the effect of ponder=off v ponder=on when it was
claimed that Crafty performed better with ponder=on.


>At least that's how I understand it. Also because to measure that
>you would a. need a lot more games because of the noise introduced by opening
>books and learners and b. it's not very significant because only the two
>programs, Crafty and Fritz were tested. That's rather a small pool. If that had
>been the object I'm sure Chessfun would have let more programs play.


The object initially as stated above was to see whether as claimed Crafty
could beat F6a in Nunn 1. Since the time controls were unknown I ran a number
of different controls. Although it can be claimed that hash is unknown as are
other variables they are still variables and in the case of mine there are none
of significance.


>Like I said I saw the object more to look at the influence of a. timecontrol and
>b. pondering on the strength of a typical program but since these (a. and b.)
>mainly influence the combination of 1. engine and 2. timing algorithm it pays
>off to limit the influence of 3. book openings and 4. learners. Hence the Nunn
>test.


I am not sure of any influence of learning. Playing 1/0 then 2/0 then 3/0
then 5/0 etc etc what is to be learned for the 5/0 game by initially playing
the 1/0 2/0 and 3/0 games. since those time controls are slower. Even if that
were the case again none of these questions were asked about Jouni's results.

They only became issues as lame arguments as much as I am the crafty hater for
simply playing these games, and that is in fact how the arguments degenerated.
My results didn't sit well therefore they are to be questioned. Jouni's suited
therefore let's all take them at face value and knitpick at small variables in
Sarah's games such as she posted at ccc while games were running...ha...as I
said bunk.

Thousands of games and results are posted here at ccc never have I seen so many
questions about variables as with mine. Therefore it can only be concluded since
no questions are asked about the other games whether the posters of those
questions are truly sincere about the questions or whether they just want to see
there name on a post.

The evidence is that the variables are minimal and as I said before IMO the
testing is done similar to the SSDF, as recent posts have pointed out they have
posted here while playing and as recent as this week there were questions on
Junior's book. These SSDF tests are results that will be used and studied by all
and yet it is of more interest to question the effect of using ccc while playing
relevant to my results than the SSDF's.

As stated also I checked each engine using taskinfo2000. I was asked questions
about virus software? is she running it on one as it is on the net and not the
other?, same thing again applies to the SSDF but I'm not sure anyone ever asked
them. That was how extreme they tried to dig and tried to make something out
of nothing.

I say this for the last time. I have two IDENTICAL computers excluding one
is win 95 the other win 98. Ram same, CPU same, Fritzmark same, rebel bench same
crafty bench same. Therefore they are the same. Making arguments about virus
software or typing in notepad are pointless and just show how little these
people are to try to make a mountain out of a molehill.

Oh that is all naturally IMHO.

>But other opening positions, for instance ones present in both books like
>Christophe suggested, or early middlegame positions would have served too.
>Jeroen Noomen did also prepare a set of reasonably balanced opening positions,
>if somebody would want to carry out more tests like this I'm sure Jeroen would
>want to e-mail them to interested parties. It's the principle involved, not the
>particular positions.


That idea is fine, but wasn't the intention of the initial testing.
It is infact similar to the idea....I suggested for the match Chess Tiger
V Diep. This idea was then taken up in that match. Using opening book then
where the programs left book reversing for game 2 starting from that exact
position. However with any testing there are variables, as long as those
variables are kept to a minimum that is the best that can be acheived.
>
> Nunn positions may favour one of the programs
>
>You could argue that because there are only a limited number of starting
>positions that the program might never play with its own book this might
>disadvantage one of the engines. True but if you think about it that is of
>course irrelevant for what you wanted to find out here.


Since again the object was to see if Crafty could win at Nunn 1 whether it
was at a disadvantage positionally or tactically from the starting position
is irrelevant since Jouni had posted Crafty had beaten F6a at Nunn 1.
The objective was simple enough, to see whether similar results to Jouni's
could be obtained.

> Autoplayer
>
>You brought up that the autoplayer can also be a disturbing factor and that is
>true of course. I didn't read the all the messages so I don't really know if
>indications of autoplayer problems came up in the threads.

There were problems with the autoplayer at 5/0 and 25/0.

For others that have recently posted incorrect statements about the 5/0
games here are the FACTS:
The games looped overnight and played two sets of 20 games.
Set one was won by F6a 15-5 and set two by crafty 11-9.

On review of both sets on both computers using Crafty's and F6a's evals
I determined that F6a's eval depths clearly were lower than they should have
been based on the time control and trying a few test positions.
Crafty's evals were also checked and found to be fine.
The game scores with these crafty evals were then discarded as the match scores
were being saved on the computer running crafty. The other comp running F6a
still has the game scores with it's eval. Point being that they were checked
using both F6a and Crafty evals to check depth.

The match was replayed and the original 15-5 score in favor of F6a stood up.

At 25/0 after an initial score of 9/0 to F6a it was found that starting in game
3 or 4 Crafty's depth compared to it's analysis was to short. F6a's were then
also checked and found to be ok.


>So I basically tried to ilustrate my reasoning about the downside of learners
>and opening books and I hope you can follow my argument a bit.
>I wouldn't know if feelings got hurt, I'm sure you didn't mean to do that.
>Surely Chessfun isn't discouraged so easily!


Whether for some reason it was believed that I had stopped the games I am
not sure. However they continue. But playing 120'/40 60'/20 30 x 20 games
takes some time as does the 1 hour games. Both are now complete and all that
is left are the 25/0 which are currently running score 8-3 to F6a.

Thanks.





This page took 0.01 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.