Author: Don Dailey
Date: 10:58:29 05/01/98
Hi Everyone, I've been thinking a lot about the super book phenomenon we are seeing and all the issues involved and would like to post some observations and opinions. This seems like a good forum to do so. Originally I started thinking about testing procedures to neutralize the affect of heavily booked programs. But the more I thought about it, the more I realized this would be impossible to do fairly. After all, each program has it's own strengths and weakness and should play openings compatible with it's own playing style (the same as we humans do!) This implies that opening preparation is an integral part of how each computer performs. This is also how it works with humans. But then you get into the issue of a computer playing the same game over and over. But just like us humans, if you allow yourself to get beat the same way over and over again then shame on you! Artificial techniques to prevent this abound, but I'm thinking they should not be applied. Looked at it another way, why should I be penalized for playing a move I know wins? YOU should be penalized for letting me do this! One very important factor is book learning and I do not know how this is handled by the raters, hopefully it is handled correctly. The issue is that if I have a program that learns from it's mistakes (which I think is a very good thing,) then that program should never be "reset" by the testing procedure. As an example, if I was a biased tester, I could simply reset the learning mechanism frequently and affect the results (perhaps) significantly. I might move the program from machine to machine or whatever it takes to defeat the learning mechanism. Having several testers testing the same program on different machines creates the same problem. I argue that the more computers you use to test a program on, the more of a handicap you give to that program if it utilizes learning mechanisms. I don't know the magnitude of the error but it certainly would be a factor to consider. The only solution I am aware of is to use the same machine to test the program on. If you use other machines you must consider them separate identities. The other problem, which I believe is a pretty big factor is opponent selection. From experiments I have done, this can have a large effect on the results. I suspect it may be the single greatest source of error the raters must face. I want to mention that I do not know how they make these decisions and I know very little about their testing methodology and am not criticizing them. I just bring this up as a potential problem. A possible solution to this problem is to have a deterministic selection procedure that does not involve human judgement or decision making. Here is a simple outline of how this might be done with out too much hassle: 1. Each program is registered to run on a given computer, OS and configuration to start with. Everything is specified and published in advance. It never changes from this configuration. I'll call each of these program/hardware/configuration combinations a "PLAYER", a given program may be registered of course as more than 1 player. 2. When a new "identity" is registered, it is given an initial rating based on 2 games with EVERY program in the current registry (all active programs.) 3. Rating is done by performing a large swiss event with each PLAYER being a participant. A standard pairing program is used to pair each round, this decision is not to be done by hand. Each ROUND of the swiss tournament should be composed of several games, I suggest 20 games but the exact number is not so important, as long it's consistant for everyone. This is a good mechanism to ensure that the opponents are being picked fairly and deterministically. The tournaments have no special meaning, they are just a mechanism for opponent selection. I believe that no one could make a reasonable claim that the testing was biased in the opponent selection process. This is one of many possible schemes to ensure that the testing is done fairly and unbiased. I'm sure there are many improvements possible to this scheme, I just present it as a possibility. - Don
This page took 0.01 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.