Computer Chess Club Archives




Subject: Book learning and rating bias

Author: Don Dailey

Date: 10:58:29 05/01/98

Hi Everyone,

I've been thinking a lot about the super book phenomenon we are seeing
and all the  issues involved and would like  to post some observations
and opinions.   This seems like a  good forum to do so.

Originally I started  thinking about testing procedures  to neutralize
the affect of  heavily booked programs.  But  the more I thought about
it, the more I realized this would be impossible  to do fairly.  After
all, each program has it's own strengths and  weakness and should play
openings compatible with it's own playing style (the same as we humans
do!)  This implies that opening preparation is an integral part of how
each computer performs.  This is also how it works with humans.

But then you get  into the issue of a  computer playing the  same game
over and over.  But just like us humans, if  you allow yourself to get
beat the same way  over and over again then  shame on you!  Artificial
techniques to prevent this abound, but I'm thinking they should not be
applied.  Looked  at  it another way,  why should  I  be penalized for
playing a move I know wins?  YOU should be penalized for letting me do

One very important factor is book learning and I do  not know how this
is  handled  by the raters, hopefully   it  is handled correctly.  The
issue is that if  I  have a program  that  learns from it's   mistakes
(which I think  is a very good thing,)  then that program should never
be "reset" by the testing procedure.  As an example, if I was a biased
tester,  I could simply   reset the learning mechanism frequently  and
affect the results (perhaps) significantly.  I  might move the program
from machine to  machine or whatever it takes  to defeat the  learning

Having several testers testing  the same program on different machines
creates the same problem.  I argue that the  more computers you use to
test a program on, the more of a handicap  you give to that program if
it utilizes  learning mechanisms.  I  don't know  the magnitude of the
error but it  certainly would  be a  factor   to  consider.  The  only
solution I am aware of is to use the  same machine to test the program
on.   If  you  use other  machines  you  must consider   them separate

The other problem, which I believe is a  pretty big factor is opponent
selection.  From experiments I have done, this can have a large effect
on the results.  I  suspect it may  be  the single greatest source  of
error the raters must face.  I want to mention that I  do not know how
they make these  decisions and I know  very little about their testing
methodology and  am not criticizing them.   I just bring  this up as a
potential problem.

A   possible  solution to this problem    is  to have  a deterministic
selection procedure that does  not involve human judgement or decision
making.    Here is a simple  outline  of how this   might be done with
out too much hassle:

1.  Each program  is registered to  run  on a  given computer,  OS and
configuration to start with.  Everything is specified and published in
advance.  It never changes from this configuration.  I'll call each of
these program/hardware/configuration  combinations a "PLAYER", a given
program may be registered of course as more than 1 player.

2. When a new "identity" is registered, it  is given an initial rating
based on 2  games  with EVERY  program in  the current  registry  (all
active programs.)

3. Rating is done by  performing a large  swiss event with each PLAYER
being a participant.  A standard pairing program is  used to pair each
round, this  decision is not to be  done by hand.   Each  ROUND of the
swiss tournament should  be composed  of  several games,  I suggest 20
games   but   the exact  number is  not   so important,  as  long it's
consistant for everyone.  This is a good mechanism  to ensure that the
opponents  are   being  picked   fairly and   deterministically.   The
tournaments have  no  special meaning,  they are just  a mechanism for
opponent selection.  I  believe  that no  one could make  a reasonable
claim that the testing was biased in the opponent selection process.

This is  one of many  possible schemes to  ensure that the  testing is
done  fairly and  unbiased.   I'm sure  there  are  many  improvements
possible to this scheme, I just present it as a possibility.

- Don

This page took 0.01 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.