Computer Chess Club Archives

Search

Terms
Messages

Subject: Re: Proposal: New testing methods for SSDF (1)

Author: Robert Hyatt
Date: 09:00:27 04/13/98
On April 13, 1998 at 10:42:39, Jeroen Noomen wrote:

>There ar however a few points that should be cleared: If a neutral
>organisation
>like SSDF are testing chessprograms, the conditions for these tests
>should be
>as equal as possible. That means: If you present results based on
>Pentium 200
>MMX, these should be the SAME Pentiums, with the same configuration.
>


one thing I see that I don't like, in the SSDF testing, is "program A
on a P5/200mmx vs program B on a P5/90".  I don't believe that makes a
lot of sense.  programs are quite sensitive to "time advantages" and
this
can be a big one...  I'd prefer to see how programs fare on equal
machines,
not how badly one on a faster machine stomps one on a calculator...



>Another point: the ChessBase autoplayer. One can ask the following
>questions:
>
>1. Why does a commercial organisation make an investment in a project
>(like f.e.
>    an autoplayer) if something similar already exists?
>2. Why comsume a lot of time and money to build a piece of software,
>instead
>    of only writing a AUTO232 driver, which is far easier and costs far
>less?


One possible answer is that the auto232 system is highly unreliable.  It
is subject to timing issues that will cause it to hang.  IE, Crafty will
hang when using tablebases, simply because it moves *too quickly* and
the
auto232 driver can't handle that for unknown reasons.

Writing your own makes sense if you want to avoid having to waste a
*lot*
of time gettin gthe thing to work, and then find that it fails when you
run on a faster machine...




>3. As I graduated in commercial economics at the Highschool for Business
>    Economics, I have learned that a commercial organisation only makes
>an
>    investment if they expect profit out of it: It should pay itself
>back, or better:
>    The investment should make more profits.
>
>A lot of speculating has been done on this subject, but coming to the
>main
>point of this topic I would like to ask all the programmers on CCC the
>following
>questions:
>
>a) Is it possible that an autoplayer can influence other programs? I.e.
>overruling
>   data, stopping some necessary routines etc.



not from the protocol I have seen, Unless CD has done some underhanded
programming which I don't believe.  IE he would have to do something
like
put a hack in the auto232 drivers, so that when sent a special "message"
the auto232 driver starts consuming large chunks of cpu time to slow the
opponent down or whatever.  Otherwise all the auto232 system can do is
stuff commands into the program's input buffer...  which could be seen
and detected easily.  I don't think this is a problem at all... it is
more
of a "reliability" issue than anything else...



>b) Is it possible to make an autoplayer that is advantageous for one
>side? I.e.
>    makes the results for one side better than they really are.
>
>I want to point out that I am absolutely no expert on this subject,
>therefore I ask
>the authors of chess programs to give answers to these questions.
>The third question:



yes, if you write *both ends*.  But if all you can do is send stuff
over the serial port, there's nothing you can force on the program on
the other end, except that which the other end's auto232 driver allows.


>
>c) If the answer to a) and b) is 'YES', would you agree to have your
>program
>    tested against such autoplayers?
>
>I have the feeling that it was wrong from SSDF to accept this way of
>testing,
>WITHOUT consulting other programmers if they agree or not. A great risk
>is
>that now programmers are removing the AUTO232 software, or 'worse': are
>starting to make their own autoplayer. This leads to 6 different
>autoplayers
>(or more) and the results of SSDF would be getting more and more
>unclear.
>Furthermore: Instead of taking time and efforts to improve the playing
>strength
>and features of a program, all programmers have to spend a huge amount
>of
>time in autoplayers, booktuning, booklearners and so on. Inevitable
>conclusion:
>The real playing strength improvements are becoming less and less,
>instead
>a lot of 'statistical imrpovements' are made. F.e. who is the best in
>copying
>won games to gain more Elo-points. (Note that older programs do not have
>any
>defence to such treatments. If you take program A without learner and
>the
>very same program with a learner, than the last one will win, although
>it is the
>same program and absolutely not stronger at all. But statistically
>everybody will
>say that the second program is stronger).
>


Here, the "old program" really is worse, because humans will do this all
day long at a tournament, if you give them the chance.  So from that
perspective, learning does improve overall play...


>To make an end to speculations, above mentioned developments and the
>recent
>'bookwar' (as I call it myself), I would like to come up with two
>proposals of new
>test methods, that could be used to determine the real differences in
>playing
>strength between chess programs. I do not want to imply these tests are
>the
>'coup the grace', but still it might be useful as an alternative, to
>prevent the
>influence from killerlines, booklearners and so on.
>
>Proposal 1:  The universal SSDF openingbook
>------------------------------------------------------------------
>
>* Out of - let's say 500,000 - grandmaster games the SSDF makes a
>universal
>   openingbook.
>* This openingbook will not be published or made available to the chess
> programmers.



this is already dead.  How can this ever be "enforced"?  And how can
it be "proven" that this hasn't happened, when some upstart pops to the
top of the list?


>* All tested programs will have to use the universal book, all testgames
>and
>  matches will be played only with this book.
>* Each match consists of a predefined number of games, let's say 40. All
>matches
>  should consist of 40 played games.
>* Doubles are not counted.


I disagree here too.  If a program doesn't learn, it ought not be
playing.  The algorithms are well-known.  Heck, source code is freely
available that implements book learning already...  so this isn't a
year-long effort, it's a weekend at most...




>* The configuration should be exactly the same for both opponents. A
>standard
>  can be agreed upon, f.e. Pentium 200 MMX with 64 MByte of memory.
>



I agree 100% there...


>Under these conditions older programs can participate as well, as the
>results of
>improved openingbooks, bookkilling, and booklearners are ruled out.
>Although a
>disadvantage could be that the new programs - with learners - could
>learn which
>lines in the new book are suitable and which are not.
>
>One can argue that a universal openingbook violates the fact that the
>openingbook
>is a part of the chessprogram. I agree to that, but I want to point out
>that this
>testmethod is meant to compare playing strength of chess programs, not
>opening
>books.
>
>Proposal 2:  50 openingpositions, each program plays one game with White
>and


this only leads to further "hand tuning"... now you know *exactly* what
your opponent will play.  look out...
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.