Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: rebel 10~!! super strong on amd k62 500

Author: Ratko V Tomic

Date: 14:28:35 07/30/00

Go up one level in this thread


RG> There is no substitute for an objective determination.
>
PK> You make a good point, and I do agree that the GM would struggle in
PK> the scenario you described.

I think you are conceding a point way too early. While I agree that
if you were to take a GM and spring at her a game between 1400 and 1600
strength players, but who are officially unrated, and ask her to tell
you their rating, she would have a hard time telling you the difference
on the spot.

But these estimates need not be spontaneous, just from feelings.
A GM can prepare and train herself for such task, for example by
developing a move scoring system (like those seen in various
tutorials where several top moves are scored, the best one gets
100 points, the others get fewer). Then she would go through the
game and evaluate each position and its several best move, down
to the move actually played and score it. She would keep adding
these points and after hundred or more scores added she would have
as much input information to work with as an ELO calculator would
using result-only on hundred or more games.

After all, we have all seen these types of sets of pre-evaluated
test positions used to estimate farly accurately the ELO of the
chess programs. The only difference here is that the evaluator, be
it a GM or a strong player or a strong program, would not pick the
test positions but the test positions would come out of the game
itself. It doesn't really matter where the 100 or more test
positions come from, as long as someone can evaluate alternatives
and assign scores to them.

For large scale evaluations, such as evaluating all the strong chess
programs the GM evaluations would be impractical and expensive. But
one could do the next best thing and deploy a strong program, or a set
of programs, with long evaluations, rigged to produce and evalute top
moves in each position from each game, fronm the top one and down to
the move actualy played. They would use the difference of the score of the
selected move from the top move and an empirically tuned (calibrated from games
of known ELO players) lookup table to fetch the points to be added to the
player's ply by ply score.

For greater objectivity and to reduce style bias, several distinct
engines may be deployed on each game and the actual score would be
obtained as weighted average of the different engine scores. The
averaging would weigh the engine depending on how well they
approximated by themselves (using this method) the well established
ELO strengths from the calibrartion games. The evaluator driver would
also supply to the engines/individual evaluators the info on when
the opening book was left by each opponent or when the table-base cases
were reachable (knowing the types of table bases available to the programs).

In principle this method is perfectly technically feasable at present
and could be used to produce rating charts for programs at least as
accurate as the SSDF, but playing 50-100 times fewer games between the
programs than SSDF (or much more accurate ratings with number of games
comparable to the SSDF matchups). Of course the evaluation of each game
may take 2-3 times additional time beyond the actual game (since the
evaluators would run slower than the programs being evaluated, but they
could all start in parallel on a network). That would still retain the
time egde of at least 10-20 times faster than the plain ELO. Any new
program could be placed accurately on the rating list (at the torunament
levels) within days instead of months.

While there may not be sufficient incentive for developing such
evaluator for stand-alone use, it may be worth for the software
manufacturers to have it for the in-house tuning of their programs,
giving them a great edge in cutting down the duration of the
tuning cycle over the simple auto-play plus ELO presently used.
Because of this competitive edge they may not wish to sell it
as a stand-alone evaluator.



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.