Author: Ratko V Tomic
Date: 12:38:19 05/23/00
Go up one level in this thread
> Clearly SSDF needs to recalibrate their rating downward > by a 200 or so points. While I agree that SSDF could use recalibration to human ELO, the problem is not that the SSDF ELO is always too high. For the top programs their rating is too high, for the bottom ones it is too low when compared to human players. So the problem is that it exaggerates the ELO difference between programs (relative to human players). The reason for this magnified ELO difference between the programs is that they all play using the algorithms with essentially the same primary strengths (short range tactics, tuned opening books, table-bases) and weaknesses (strategy, long range tactics). In effect the programs are competing in a reduced dimensionality space (the dimensions being various aspects of play) compared to humans who individually vary in more dimensions. Continuing with this picturesque analogy (with random walks), for the number N of "improvement steps" of program A vs B (assuming "steps" in 1-Dimension, i.e. 1 aspect, speed), the program A will drift N units of strength distance away from B. But if the N improvement steps are in 2-Dimensions, and say N=2, then the average distance in 2 steps is not 2 strength units any more but (2+sqrt(2)+2)/3=1.80, i.e. less than for the 2 improvement steps in 1 Dimension. Similarly, in 3-D the 2 improvement steps yield strength distance of 1.707 [i.e. (2+2+2+3*sqrt(2))/6]. So, for a given number of "improvement steps" the more different strength aspects one varies between players the smaller will be the average strength distance between the players. Perhaps, a simple fix SSDF could try is to recalculate their ratings from their existent database of comp-comp results but using a factor smaller than 400 (200-300 may work better) when calculating the rating difference from the comp-comp results. A better factor (than 400) for the comp-comp could then be extracted (via least squares) from the known human-computer results at the both ends of the program strength spectrum. Once a good factor for comp-comp scores is found, the SSDF would stay in sync with the human ELO for much longer than their current scheme. Of course, a drastically new kind of a chess program or a development of a more systematic anti-computer strategy by human players would require further branching in the correct ELO factors.
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.