Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Book learning and rating bias

Author: Bruce Moreland

Date: 13:12:40 05/01/98

Go up one level in this thread



On May 01, 1998 at 13:58:29, Don Dailey wrote:

>Hi Everyone,
>
>I've been thinking a lot about the super book phenomenon we are seeing
>and all the  issues involved and would like  to post some observations
>and opinions.   This seems like a  good forum to do so.
>
>Originally I started  thinking about testing procedures  to neutralize
>the affect of  heavily booked programs.  But  the more I thought about
>it, the more I realized this would be impossible  to do fairly.  After
>all, each program has it's own strengths and  weakness and should play
>openings compatible with it's own playing style (the same as we humans
>do!)  This implies that opening preparation is an integral part of how
>each computer performs.  This is also how it works with humans.

Right.  Some feel that only part of a chess program is interesting.  I
take the program as a whole, including the book.  If someone takes the
time to figure out what their program plays well, they should reap the
benefits.

The goal is not to narrow the domain even more, by specifying some
particular starting position, or demanding that some algorithms and
heuristics be used and some be considered to be illegal.  The goal is to
take a computer, put some software and data on it, and get it playing
great chess.

>But then you get  into the issue of a  computer playing the  same game
>over and over.  But just like us humans, if  you allow yourself to get
>beat the same way  over and over again then  shame on you!  Artificial
>techniques to prevent this abound, but I'm thinking they should not be
>applied.  Looked  at  it another way,  why should  I  be penalized for
>playing a move I know wins?  YOU should be penalized for letting me do
>this!

Yes.  This moves beyond the single-game domain to the more challenging
multiple-game (match) domain.  I think this is a fine thing to do, the
problems are harder, the chess is better, and the end users are more
thoroughly entertained.

>One very important factor is book learning and I do  not know how this
>is  handled  by the raters, hopefully   it  is handled correctly.  The
>issue is that if  I  have a program  that  learns from it's   mistakes
>(which I think  is a very good thing,)  then that program should never
>be "reset" by the testing procedure.  As an example, if I was a biased
>tester,  I could simply   reset the learning mechanism frequently  and
>affect the results (perhaps) significantly.  I  might move the program
>from machine to  machine or whatever it takes  to defeat the  learning
>mechanism.
>
>Having several testers testing  the same program on different machines
>creates the same problem.  I argue that the  more computers you use to
>test a program on, the more of a handicap  you give to that program if
>it utilizes  learning mechanisms.  I  don't know  the magnitude of the
>error but it  certainly would  be a  factor   to  consider.  The  only
>solution I am aware of is to use the  same machine to test the program
>on.   If  you  use other  machines  you  must consider   them separate
>identities.

Absolutely.  What you want to do here is specify a domain and stick with
it.  If the domain is the single game, learners are illegal, and it's
perfectly fine to play one game each on twenty computers or twenty games
on one computer.  If the domain is multiple games, you have to be more
careful.

One idea is to declare that all matches are X games, and that fresh
installs are made at the start of all matches.

If you demand a fresh install, the order in which a program encounters
opponents matters, and ideally you'd want to have each program on its
own *single* permanently dedicated computer, which is probably very
impractical.

>The other problem, which I believe is a  pretty big factor is opponent
>selection.  From experiments I have done, this can have a large effect
>on the results.  I  suspect it may  be  the single greatest source  of
>error the raters must face.  I want to mention that I  do not know how
>they make these  decisions and I know  very little about their testing
>methodology and  am not criticizing them.   I just bring  this up as a
>potential problem.
>
>A   possible  solution to this problem    is  to have  a deterministic
>selection procedure that does  not involve human judgement or decision
>making.    Here is a simple  outline  of how this   might be done with
>out too much hassle:
>
>1.  Each program  is registered to  run  on a  given computer,  OS and
>configuration to start with.  Everything is specified and published in
>advance.  It never changes from this configuration.  I'll call each of
>these program/hardware/configuration  combinations a "PLAYER", a given
>program may be registered of course as more than 1 player.

It would be improper to allow multiple entries, because this creates
more chances of one of them getting a rating that is higher than it
should be.

We'd be right back where we were in the 80's, when these companies
entered several machines in a tournament, and touted the results of the
one who did best.

bruce

>2. When a new "identity" is registered, it  is given an initial rating
>based on 2  games  with EVERY  program in  the current  registry  (all
>active programs.)
>
>3. Rating is done by  performing a large  swiss event with each PLAYER
>being a participant.  A standard pairing program is  used to pair each
>round, this  decision is not to be  done by hand.   Each  ROUND of the
>swiss tournament should  be composed  of  several games,  I suggest 20
>games   but   the exact  number is  not   so important,  as  long it's
>consistant for everyone.  This is a good mechanism  to ensure that the
>opponents  are   being  picked   fairly and   deterministically.   The
>tournaments have  no  special meaning,  they are just  a mechanism for
>opponent selection.  I  believe  that no  one could make  a reasonable
>claim that the testing was biased in the opponent selection process.
>
>This is  one of many  possible schemes to  ensure that the  testing is
>done  fairly and  unbiased.   I'm sure  there  are  many  improvements
>possible to this scheme, I just present it as a possibility.
>
>- Don



This page took 0.22 seconds to execute

Last modified: Thu, 07 Jul 11 08:48:38 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.