Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Why is this a hard question to answer?

Author: Robert Hyatt

Date: 13:53:08 01/06/99

Go up one level in this thread


On January 06, 1999 at 10:26:20, KarinsDad wrote:

>
>>
>>You results aren't necessarily invalid...  nor are they necessarily a positive
>>answer to the question posed...  it's _not_ an easy question to answer...
>
>Robert,
>
>I guess I'm missing the point (us old guys do that all of the time, that's why I
>can't play good chess).
>
>This seems like it would be an extremely easy question to answer.
>
>Isn't there a set of benchmarks done on the approximate strength of programs for
>the last 10 years running on given systems (such as the ssdf and Eugene's
>testing)? Isn't these sets of benchmarks used within this club all of the time
>to indicate one thing or another (usually strength of one computer versus
>another)?


no...  and that is the problem.. there are lots of things to measure about a
program, from its speed, to the depth it reaches, to how its evaluation works
and so forth.  And lots of decisions within a program were made based on
some common assumptions about hardware speeds, for example.  And when you change
the hardware speed artificially (as in slowing it down by playing at short
time controls) some programs are affected differently than others.


>
>Couldn't these tournament results be used to estimate the ELO strength of each
>engine in a given computer environment (within statistical tolerance levels)?

there aren't enough games...  computer vs computer is one thing, computer vs
human is something different, and the latter has very few games for us to look
at unfortunately...




>
>Couldn't a few select engines (maybe 6 of the most powerful ones) then be set to
>various depths, run against these other engines at those depths (hence 6 of the
>programs are used with variant depths and the rest of the "known" programs are
>used as the control) in a "super" tournament (i.e. the variant depth programs
>would not be run against each other, only against the control engines) and then
>for each engine/depth, you would have an approximate idea of their ELO rating?


yes...  but the "gain per ply" would only be good for the program you actually
tested against...  and it might be highly exaggerated.  IE crafty at shallow
plies is very bad when playing against non-null programs, because it will make
lots of simple mistakes.  But at deeper depths, these things don't show up much
and it behaves differently.  This makes such data very hard to analyze..



>
>From this data, couldn't you extrapolate (or estimate as it were) the rating
>differential for not only those 6 test engines, but also for chess programs in
>general (realizing that for programs in general, it would be a less exacting
>estimate)?


with the original question being "how much per ply" this would still be very
dicey to answer.  For a specific version of a specific program, yes you could
do this.  But the data wouldn't apply to other versions of that same program
by any means...




>
>Obviously, when dealing with the performance capabilities of programs as complex
>as chess engines, it is important to understand that just changing one variable
>such as depth will also change the overall search path taken and hence will
>"skew" the results somewhat (from the purist's point of view). However, if we
>are just talking about strength, you can either use the data points that you
>have acquired and realize that they are an approximate (and probably a darn good
>one), or you can take the purist's point of view and disregard all of the data
>because none of it is based on only one variable being changed in the experiment
>(i.e. how do you know that engine x's rating run on a different P2 200Mz system
>is the same as on the original ssdf P2 200Mz system?).
>
>This would take a lot of work to set up such an extensive experiment (i.e.
>tournament), but you could get a reasonable approximation of the answer to the
>question and in a lot of scientific fields, we do not have answers, just
>reasonable approximations.
>
>The only thing that makes the question difficult (IMO) is that you may have to
>increase the amount of time the variant depth engines search (in order to give
>them a reasonable amount of time at each depth) as compared to the control
>engines (which would be running at a "control" time). The two sets of engines
>would also have to be set (if using an increasing time differential for the
>depth modified engines) to not be running on the opponent's time.

the ideal would be to have *no* engine search less than 3 minutes, and take
that time _up_ rather than _down_ so that the shallow searches don't confuse
things.  But that would take a very long time to complete...





>
>The increase in time may have to be exponential, but determining a "fair" time
>increment may be difficult (or maybe not for you guys that have been doing this
>for years). Of course, the most "controlled" environment would be one in which
>all of the programs ran within the same time constraints, but I think that would
>skew the data even more (I doubt giving Crafty a 20 ply depth over a 19 ply
>depth would increase it's playing strength much if it only has 3 minutes per
>move in both cases).
>
>Keep plugging :)
>
>KarinsDad



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.