Author: Stephen A. Boak
Date: 11:07:23 01/29/06
Go up one level in this thread
On January 29, 2006 at 12:27:26, Uri Blass wrote: >The main question is which program win more when the programs disagree. >Uri Food for thought (rambling mix of related ideas): There is also the other side of the coin to consider: -- When the programs agree, which program relatively wins more and by how much? -- Both programs agree Fritz side is better--what are the actual (relative) results. Both programs agree Shredder side is better--what are the actual (relative) results. Both programs agree the position is equal--what are the actual (relative) results. Bottom line, final results alone indicate (on the average--over many games) which program handled (overall) the prior game positions the best. As long as the winner of a chess game is defined as the opponent who made the next to last mistake, this will always be true. IM Larry Kaufman (I believe) and some others have done studies to determine the relative worth of a pawn, bishop or knight, bishop pair, or an exchange, etc, compared to scoring percentage, i.e. relative ELO worth. Even this type of analysis is very general (although interesting and worthwhile). It could be refined; however, I don't suggest perfection is ever attainable. :) Example--if Fritz is up a pawn against Shredder, how often does Shredder win? Example--if Fritz is down a pawn against Shredder, how often does Shredder win? If the relative scoring percentage indicates a particular program handles difficult (or advantageous) positions better, on the average, than the opponents, then that program is likely to also be stronger in ELO after adequate match tests within a similar pool. Using ELO terminology, what is the expected scoring average--win expectancy value, expressed as a percent of total available game points--based on current relative ELOs. And then, does the program perform on the average above or below the expected scoring average? This can be applied to positions where A & B evals are equal, where A eval is higher than B eval, where A eval is lower than B eval. Don't forget, as one example, that the strongest GMs know the best times to sacrifice a pawn to gain an attack or spring loose some ineffective piece(s). In effect, if they pick & choose intentionally when to shed a pawn, and do it only when they see necessary compensation, then such GMs would perform above average expectations for positions down a pawn. GRAPHICAL COMPARISON IDEA I can envision a 3-dimensional chart, where one dimension is relative ELO difference, one dimension is relative eval difference, and the remaining dimension is relative scoring percentage. This chart can be compiled to learn the averages, overall for a pool, based on many games. This chart can be compiled for individual players, within the pool, to see whether the individual player performs better/worse than the pool averages in similar situations. The problem with this approach is that all programs which perform better/worse than expectation have their ELOs automatically adjusted up/down over time, until their ELOs are in exact synch (subject to variation alone) with their abilities in positions ranging from pawn down to pawn up, etc. It might lead to a better understanding, however, of which programs are killers (big winners) if they get 1 point up in eval. Or which programs are tremendous defenders if they get 1-point down in eval (yet hold the draw or even win). But all in all, at the bottom line, such strengths and weaknesses are already averaged or netted out overall in a general ELO rating, measured by and large against a large pool. Still, the investigation is interesting, and could provide useful ideas for varying the style of a program, depending on the style of a particular opponent. If your opponent is great at sacking pawns and attacking & winning, it might be better if your program plays solid and prevents useful pawn sack ideas by the opponent. If your opponent (say, Robo-Topalov) is great at sacking the exchange and winning, your program paramaters should be adjusted to investigate the followups after potential exchange sacks to avoid surprise in its games against Robo-Topalov. If enough games and analysis time existed, it would be possible to take 25-50 interesting types of positions (up a pawn, down a pawn, equal position, up an exchange, down an exchange, equal side castling, opposite wing castling, knight vs bishop, queen vs rook and minor, etc), and rate a lot of programs for their relative ELO performance above/below the pool average, for such positions. If one took only certain simple situations, exchange up, exchange down, like IM Kaufman, one could investigate and learn some interesting things about overall pool handling of such positions (either side), versus individual program performances in handling such positions. It would help to isolate program strengths and weaknesses, in certain positions. --Steve
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.