Computer Chess Club Archives

Search

Terms
Messages

Subject: Re: Proposal: New testing methods for SSDF (1)

Author: Jeroen Noomen
Date: 13:33:31 04/13/98
On April 13, 1998 at 12:57:07, Dirk Frickenschmidt wrote:

Hello Dirk,

Thanks for your reaction. I know, you are right: there are a lot of
advantages and disadvantages to the mentioned testmethods. But I think
each method will have drawbacks. I only want to give two examples of
different testmethods, opposite to the normal 40/2 games played right
now by SSDF.

>first of all congratulations for one of the most substantial post on a
>critical issue I have read here since long!
>
>Both of the testing methods you propose are worth thinking of.
>
>1. Concerning the latter with 50 opening positions I think this way of
>testing is most interesting. But I see 3 problems:
>
>a) 50 positions are too many (though this number is shurely adequate for
>the variety of modern chess), consuming *much* too much time, having to
>play 100 games in each match program versus program. I think the limit
>for the practicability for this kind of test are 20 positions. I know it
>will be hard to find a good, representative testset then.

This is true. 100 games with -let's say 6 games per day- will take more
than two weeks. On the other hand many different positions are
necessary.
Maybe 20 positions can be done, or 25.

>b) The 50 positions you posted here (which I regard as very generous
>from you as a professional tester and bookwriter, not just posting the
>results of a hobby) are certainly well chosen.
>The only problem is that they - like many of the Nunn positions - end
>too soon compared to modern opening books (in computers as well as in
>human brains).

This is a main problem: do you take positions right after the opening,
or
do you take 8-10 moves to see how the chess program would proceed? Maybe
both types should be taken.

>In fact I think this does not mirror the strength of modern programs
>which normally are in book at least around move 12-15, and in some cases
>(if this makes sense is another question) up to much higher move
>numbers.
>
>So I think a compromise would be useful: taking a set of slightly more
>developed positions (kings castled and substantial pawn structures/piece
>places for the beginning middlegame on board: I guess this would mean
>about three or four moves more in average than your average was). Again
>I know this is not easily done: the more specific the chosen positions
>are, the more important it becomes that your choice still covers
>*relevant* positions, the games and results of them giving some insight
>into what playing strength and style the programs offer.

I agree. This can be done, just adding moves to the given lines. Perhaps
we should extend 25 lines, leaving the other 25 as they are.

>c) For me as a user it would be a pity and a real drawback seeing none
>of the SSDF games (still seeing few of them anyway)! What interests me
>most are often not the pure results but to see *how* a program plays a
>given position against a certain opponent.

Absolutely. But you can also play the games first, than present the
results, and post the games. Although the testpositions would be useless
from that point.

>2. Concerning your first method, although it is perhaps not as
>attractive as the latter, I think in general it would be easier to
>handle for the SSDF.
>(Just by the way :-) One question will be: will they finally kill the
>doubles as you and I and others have been hoping for since long? Will
>they finally admit at all that something like a double can be defined?)

I agree on the point of the 'doubles': It is crazy to count them. This
way the program that is the best in copying won games, gets the best
results. Especially versus older programs, that do not have any defence
against this. People call this an improvement, but IMO this is clearly
not an improvement in playing strength.

>But even in this case there remain problems to be solved:
>
>a) which kind of database compiled by whom should be taken as basis?

500,000 games from any database. I don't mind if it is ChessBase or
TascBase or any other database. Main point is that all use the same book
and a strong player -preferably a IM or IGM- build the book.

>b) How will SSDF testers, most of whom are until today not able to save
>their test games in a common format and publish the games, ever be able
>to handle the technical aspects of this procedure (having to have some
>kind of extra porgram play out an opening choice by chance and then
>setting up this position on both computers and still getting all the
>more or less working auto232s to do their job as required? Or converting
>the big book into all available formats witjout the help of the
>programmers? How can it be done?)

Good point, this is really a problem. Each program should be able to
start an autoplayer session with given openinglines, stored in a
database. But I admit that this is not easy to get...

>c) where will the openings be cut off and with which kind of strange
>effects in different openings (I observed some of these problems in the
>Fritz5 powerbook which is *very* broad but in some variations not as
>deep as human theory or some computer books)?

I'll leave that decision to the strong player who will make the book.

>3. Concerning the auto232 device.
>
>I had the opportunity to use the chessbase autoplayer and observe the
>results. I noticed no special effect of it at all and have come to the
>conclusion that it works just like any well known auto232 device except
>for the nice feature that it switches between white and black games (so
>you don't have to play a whole series with one colour before using the
>other). Until now I have never seen any effect that makes be think of
>something manipulative in it. And, frankly, I am convinced that someone
>like Matthias Wuellenweber would never try to use such a technical
>device as a kind of cheating device even if that would be technically
>possible (I still have not yet heard any plausible argument concerning
>this possibility).

A lot of speculation has been done, but I trust your view and I can
imagine this autoplayer is used as a defensive weapon against
outbooking.

>The main problem are the more and more absurd "book wars" of which you
>have been a victim yourself at times.
>
>Although I don't like the Chessbase reaction as a user, I must admit I
>understand it from Chessbase's view: they simply want to avoid the new
>kind of killing book (I call it like that no matter what others think of
>it) where pre-played autoplayer games become part of a new book which
>then plays these wins as "openings" against the chosen targets in the
>SSDF list.

True. But it would have been wiser to claim this in advance, before the
Fritz 5 was sent to Sweden. Now it all remains a mystery to many people.
Many unsolved questions and no statement from Sweden. This is a pity,
because I think everyone would like to know the reaction from SSDF.

>As far as I know this is the only reason why chessbase refuses to make
>their autoplayer available for everybody: it seems to be no secret
>cheating device, but a simple auto232 player preventing to be booked by
>others (not by you, as I know from your fair and attractive way of book
>programming).
>
>Perhaps there are solutions for these problems?
>
>Your innovative article will shurely encourage others as well as me not
>to give up too easy looking for some.

Thanks. In the first place this article was meant to give food for
thought
and to think about other methods.

>Thanks and kind regards
>from Dirk

Greetings and Happy Easter,
Jeroen
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.