Author: Bruce Moreland
Date: 13:12:40 05/01/98
Go up one level in this thread
On May 01, 1998 at 13:58:29, Don Dailey wrote: >Hi Everyone, > >I've been thinking a lot about the super book phenomenon we are seeing >and all the issues involved and would like to post some observations >and opinions. This seems like a good forum to do so. > >Originally I started thinking about testing procedures to neutralize >the affect of heavily booked programs. But the more I thought about >it, the more I realized this would be impossible to do fairly. After >all, each program has it's own strengths and weakness and should play >openings compatible with it's own playing style (the same as we humans >do!) This implies that opening preparation is an integral part of how >each computer performs. This is also how it works with humans. Right. Some feel that only part of a chess program is interesting. I take the program as a whole, including the book. If someone takes the time to figure out what their program plays well, they should reap the benefits. The goal is not to narrow the domain even more, by specifying some particular starting position, or demanding that some algorithms and heuristics be used and some be considered to be illegal. The goal is to take a computer, put some software and data on it, and get it playing great chess. >But then you get into the issue of a computer playing the same game >over and over. But just like us humans, if you allow yourself to get >beat the same way over and over again then shame on you! Artificial >techniques to prevent this abound, but I'm thinking they should not be >applied. Looked at it another way, why should I be penalized for >playing a move I know wins? YOU should be penalized for letting me do >this! Yes. This moves beyond the single-game domain to the more challenging multiple-game (match) domain. I think this is a fine thing to do, the problems are harder, the chess is better, and the end users are more thoroughly entertained. >One very important factor is book learning and I do not know how this >is handled by the raters, hopefully it is handled correctly. The >issue is that if I have a program that learns from it's mistakes >(which I think is a very good thing,) then that program should never >be "reset" by the testing procedure. As an example, if I was a biased >tester, I could simply reset the learning mechanism frequently and >affect the results (perhaps) significantly. I might move the program >from machine to machine or whatever it takes to defeat the learning >mechanism. > >Having several testers testing the same program on different machines >creates the same problem. I argue that the more computers you use to >test a program on, the more of a handicap you give to that program if >it utilizes learning mechanisms. I don't know the magnitude of the >error but it certainly would be a factor to consider. The only >solution I am aware of is to use the same machine to test the program >on. If you use other machines you must consider them separate >identities. Absolutely. What you want to do here is specify a domain and stick with it. If the domain is the single game, learners are illegal, and it's perfectly fine to play one game each on twenty computers or twenty games on one computer. If the domain is multiple games, you have to be more careful. One idea is to declare that all matches are X games, and that fresh installs are made at the start of all matches. If you demand a fresh install, the order in which a program encounters opponents matters, and ideally you'd want to have each program on its own *single* permanently dedicated computer, which is probably very impractical. >The other problem, which I believe is a pretty big factor is opponent >selection. From experiments I have done, this can have a large effect >on the results. I suspect it may be the single greatest source of >error the raters must face. I want to mention that I do not know how >they make these decisions and I know very little about their testing >methodology and am not criticizing them. I just bring this up as a >potential problem. > >A possible solution to this problem is to have a deterministic >selection procedure that does not involve human judgement or decision >making. Here is a simple outline of how this might be done with >out too much hassle: > >1. Each program is registered to run on a given computer, OS and >configuration to start with. Everything is specified and published in >advance. It never changes from this configuration. I'll call each of >these program/hardware/configuration combinations a "PLAYER", a given >program may be registered of course as more than 1 player. It would be improper to allow multiple entries, because this creates more chances of one of them getting a rating that is higher than it should be. We'd be right back where we were in the 80's, when these companies entered several machines in a tournament, and touted the results of the one who did best. bruce >2. When a new "identity" is registered, it is given an initial rating >based on 2 games with EVERY program in the current registry (all >active programs.) > >3. Rating is done by performing a large swiss event with each PLAYER >being a participant. A standard pairing program is used to pair each >round, this decision is not to be done by hand. Each ROUND of the >swiss tournament should be composed of several games, I suggest 20 >games but the exact number is not so important, as long it's >consistant for everyone. This is a good mechanism to ensure that the >opponents are being picked fairly and deterministically. The >tournaments have no special meaning, they are just a mechanism for >opponent selection. I believe that no one could make a reasonable >claim that the testing was biased in the opponent selection process. > >This is one of many possible schemes to ensure that the testing is >done fairly and unbiased. I'm sure there are many improvements >possible to this scheme, I just present it as a possibility. > >- Don
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.