Computer Chess Club Archives


Search

Terms

Messages

Subject: Why is this a hard question to answer?

Author: KarinsDad

Date: 07:26:20 01/06/99

Go up one level in this thread



>
>You results aren't necessarily invalid...  nor are they necessarily a positive
>answer to the question posed...  it's _not_ an easy question to answer...

Robert,

I guess I'm missing the point (us old guys do that all of the time, that's why I
can't play good chess).

This seems like it would be an extremely easy question to answer.

Isn't there a set of benchmarks done on the approximate strength of programs for
the last 10 years running on given systems (such as the ssdf and Eugene's
testing)? Isn't these sets of benchmarks used within this club all of the time
to indicate one thing or another (usually strength of one computer versus
another)?

Couldn't these tournament results be used to estimate the ELO strength of each
engine in a given computer environment (within statistical tolerance levels)?

Couldn't a few select engines (maybe 6 of the most powerful ones) then be set to
various depths, run against these other engines at those depths (hence 6 of the
programs are used with variant depths and the rest of the "known" programs are
used as the control) in a "super" tournament (i.e. the variant depth programs
would not be run against each other, only against the control engines) and then
for each engine/depth, you would have an approximate idea of their ELO rating?

From this data, couldn't you extrapolate (or estimate as it were) the rating
differential for not only those 6 test engines, but also for chess programs in
general (realizing that for programs in general, it would be a less exacting
estimate)?

Obviously, when dealing with the performance capabilities of programs as complex
as chess engines, it is important to understand that just changing one variable
such as depth will also change the overall search path taken and hence will
"skew" the results somewhat (from the purist's point of view). However, if we
are just talking about strength, you can either use the data points that you
have acquired and realize that they are an approximate (and probably a darn good
one), or you can take the purist's point of view and disregard all of the data
because none of it is based on only one variable being changed in the experiment
(i.e. how do you know that engine x's rating run on a different P2 200Mz system
is the same as on the original ssdf P2 200Mz system?).

This would take a lot of work to set up such an extensive experiment (i.e.
tournament), but you could get a reasonable approximation of the answer to the
question and in a lot of scientific fields, we do not have answers, just
reasonable approximations.

The only thing that makes the question difficult (IMO) is that you may have to
increase the amount of time the variant depth engines search (in order to give
them a reasonable amount of time at each depth) as compared to the control
engines (which would be running at a "control" time). The two sets of engines
would also have to be set (if using an increasing time differential for the
depth modified engines) to not be running on the opponent's time.

The increase in time may have to be exponential, but determining a "fair" time
increment may be difficult (or maybe not for you guys that have been doing this
for years). Of course, the most "controlled" environment would be one in which
all of the programs ran within the same time constraints, but I think that would
skew the data even more (I doubt giving Crafty a 20 ply depth over a 19 ply
depth would increase it's playing strength much if it only has 3 minutes per
move in both cases).

Keep plugging :)

KarinsDad



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.