Author: KarinsDad
Date: 07:26:20 01/06/99
Go up one level in this thread
> >You results aren't necessarily invalid... nor are they necessarily a positive >answer to the question posed... it's _not_ an easy question to answer... Robert, I guess I'm missing the point (us old guys do that all of the time, that's why I can't play good chess). This seems like it would be an extremely easy question to answer. Isn't there a set of benchmarks done on the approximate strength of programs for the last 10 years running on given systems (such as the ssdf and Eugene's testing)? Isn't these sets of benchmarks used within this club all of the time to indicate one thing or another (usually strength of one computer versus another)? Couldn't these tournament results be used to estimate the ELO strength of each engine in a given computer environment (within statistical tolerance levels)? Couldn't a few select engines (maybe 6 of the most powerful ones) then be set to various depths, run against these other engines at those depths (hence 6 of the programs are used with variant depths and the rest of the "known" programs are used as the control) in a "super" tournament (i.e. the variant depth programs would not be run against each other, only against the control engines) and then for each engine/depth, you would have an approximate idea of their ELO rating? From this data, couldn't you extrapolate (or estimate as it were) the rating differential for not only those 6 test engines, but also for chess programs in general (realizing that for programs in general, it would be a less exacting estimate)? Obviously, when dealing with the performance capabilities of programs as complex as chess engines, it is important to understand that just changing one variable such as depth will also change the overall search path taken and hence will "skew" the results somewhat (from the purist's point of view). However, if we are just talking about strength, you can either use the data points that you have acquired and realize that they are an approximate (and probably a darn good one), or you can take the purist's point of view and disregard all of the data because none of it is based on only one variable being changed in the experiment (i.e. how do you know that engine x's rating run on a different P2 200Mz system is the same as on the original ssdf P2 200Mz system?). This would take a lot of work to set up such an extensive experiment (i.e. tournament), but you could get a reasonable approximation of the answer to the question and in a lot of scientific fields, we do not have answers, just reasonable approximations. The only thing that makes the question difficult (IMO) is that you may have to increase the amount of time the variant depth engines search (in order to give them a reasonable amount of time at each depth) as compared to the control engines (which would be running at a "control" time). The two sets of engines would also have to be set (if using an increasing time differential for the depth modified engines) to not be running on the opponent's time. The increase in time may have to be exponential, but determining a "fair" time increment may be difficult (or maybe not for you guys that have been doing this for years). Of course, the most "controlled" environment would be one in which all of the programs ran within the same time constraints, but I think that would skew the data even more (I doubt giving Crafty a 20 ply depth over a 19 ply depth would increase it's playing strength much if it only has 3 minutes per move in both cases). Keep plugging :) KarinsDad
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.