Computer Chess Club Archives

Search

Terms
Messages

Subject: Re: Rybka's current exe size: 4 628 480 !

Author: Stephen A. Boak
Date: 11:07:23 01/29/06
On January 29, 2006 at 12:27:26, Uri Blass wrote:

>The main question is which program win more when the programs disagree.
>Uri

Food for thought (rambling mix of related ideas):

There is also the other side of the coin to consider:

-- When the programs agree, which program relatively wins more and by how much?
--

Both programs agree Fritz side is better--what are the actual (relative)
results.

Both programs agree Shredder side is better--what are the actual (relative)
results.

Both programs agree the position is equal--what are the actual (relative)
results.

Bottom line, final results alone indicate (on the average--over many games)
which program handled (overall) the prior game positions the best.

As long as the winner of a chess game is defined as the opponent who made the
next to last mistake, this will always be true.

IM Larry Kaufman (I believe) and some others have done studies to determine the
relative worth of a pawn, bishop or knight, bishop pair, or an exchange, etc,
compared to scoring percentage, i.e. relative ELO worth.

Even this type of analysis is very general (although interesting and
worthwhile).  It could be refined; however, I don't suggest perfection is ever
attainable.  :)

Example--if Fritz is up a pawn against Shredder, how often does Shredder win?
Example--if Fritz is down a pawn against Shredder, how often does Shredder win?

If the relative scoring percentage indicates a particular program handles
difficult (or advantageous) positions better, on the average, than the
opponents, then that program is likely to also be stronger in ELO after adequate
match tests within a similar pool.

Using ELO terminology, what is the expected scoring average--win expectancy
value, expressed as a percent of total available game points--based on current
relative ELOs.  And then, does the program perform on the average above or below
the expected scoring average?

This can be applied to positions where A & B evals are equal, where A eval is
higher than B eval, where A eval is lower than B eval.

Don't forget, as one example, that the strongest GMs know the best times to
sacrifice a pawn to gain an attack or spring loose some ineffective piece(s).

In effect, if they pick & choose intentionally when to shed a pawn, and do it
only when they see necessary compensation, then such GMs would perform above
average expectations for positions down a pawn.

GRAPHICAL COMPARISON IDEA

I can envision a 3-dimensional chart, where one dimension is relative ELO
difference, one dimension is relative eval difference, and the remaining
dimension is relative scoring percentage.

This chart can be compiled to learn the averages, overall for a pool, based on
many games.

This chart can be compiled for individual players, within the pool, to see
whether the individual player performs better/worse than the pool averages in
similar situations.

The problem with this approach is that all programs which perform better/worse
than expectation have their ELOs automatically adjusted up/down over time, until
their ELOs are in exact synch (subject to variation alone) with their abilities
in positions ranging from pawn down to pawn up, etc.

It might lead to a better understanding, however, of which programs are killers
(big winners) if they get 1 point up in eval.  Or which programs are tremendous
defenders if they get 1-point down in eval (yet hold the draw or even win).

But all in all, at the bottom line, such strengths and weaknesses are already
averaged or netted out overall in a general ELO rating, measured by and large
against a large pool.

Still, the investigation is interesting, and could provide useful ideas for
varying the style of a program, depending on the style of a particular opponent.

If your opponent is great at sacking pawns and attacking & winning, it might be
better if your program plays solid and prevents useful pawn sack ideas by the
opponent.

If your opponent (say, Robo-Topalov) is great at sacking the exchange and
winning, your program paramaters should be adjusted to investigate the followups
after potential exchange sacks to avoid surprise in its games against
Robo-Topalov.

If enough games and analysis time existed, it would be possible to take 25-50
interesting types of positions (up a pawn, down a pawn, equal position, up an
exchange, down an exchange, equal side castling, opposite wing castling, knight
vs bishop, queen vs rook and minor, etc), and rate a lot of programs for their
relative ELO performance above/below the pool average, for such positions.

If one took only certain simple situations, exchange up, exchange down, like IM
Kaufman, one could investigate and learn some interesting things about overall
pool handling of such positions (either side), versus individual program
performances in handling such positions.

It would help to isolate program strengths and weaknesses, in certain positions.

--Steve
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.