Author: Russell Reagan
Date: 00:19:09 07/09/03
In recent rec.games.chess.computer, a question was asked regarding what the
absolute strongest settings for Chessmaster were. The poster said he discovered
that the default settings were weaker by at least 97 ELO points and sited
Surak's rating list (http://www.grailmaster.com/misc/chess/comp/cm.html).
Surak's rating list contains only ratings of different Chessmaster versions and
settings. Even with different settings, it's all the same engine, so I don't
imagine the rating differences could realistically be almost 100 ELO points (at
least not 100 points above what the author believes to be the strongest).
To illistrate this point, I decided to play a little tournament between engines
that had similar ratings (still in progress). Here are the results, so far:
Score 1 2 3 4 5 6 7 8
--------------------------------------------------------------------------
1: Engine G 19.0 / 30 XXXXX ==10. 1===. ===1. ==11. 01==1 11==. 110==
2: Engine D 16.5 / 30 ==01. XXXXX ====. ===== =11== =011. ==1=. 001=.
3: Engine F 15.0 / 30 0===. ====. XXXXX 0==1. =1=== 1=00. 1==== ==10.
4: Engine E 15.0 / 30 ===0. ===== 1==0. XXXXX =000. ===== ==11. ==11.
5: Engine C 15.0 / 30 ==00. =00== =0=== =111. XXXXX 1===. ==10. 0=11.
6: Engine B 14.0 / 30 10==0 =100. 0=11. ===== 0===. XXXXX =0=1. 01==.
7: Engine H 13.0 / 30 00==. ==0=. 0==== ==00. ==01. =1=0. XXXXX =11==
8: Engine A 12.5 / 30 001== 110=. ==01. ==00. 1=00. 10==. =00== XXXXX
--------------------------------------------------------------------------
120 games: +30 =69 -21
Program Elo + - Games Score Av.Op. Draws
1 Engine G : 2582 114 74 30 63.3 % 2487 53.3 %
2 Engine D : 2531 127 61 30 55.0 % 2496 63.3 %
3 Engine C : 2501 90 90 30 50.0 % 2501 53.3 %
4 Engine E : 2500 76 76 30 50.0 % 2500 66.7 %
5 Engine F : 2499 76 76 30 50.0 % 2499 66.7 %
6 Engine B : 2482 73 130 30 46.7 % 2505 53.3 %
7 Engine H : 2457 65 124 30 43.3 % 2504 60.0 %
8 Engine A : 2449 87 122 30 41.7 % 2508 43.3 %
This indicates a rating difference of 133 ELO points. The funny thing is, every
engine is Crafty, the exact same binary, using the exact same settings. If this
kind of testing can produce a difference of 133 rating points between the exact
same engine, what does that say about a mere 97 rating point difference between
the different Chessmaster settings?
This tells me that when testing an improvement to an engine, you shouldn't use
head to head results as a good indicator of whether or not the new version is
actaully an improvement. Thoughts?
Also, what does this say about the controversial issue surrounding whether or
not the default settings of Chessmaster are indeed the strongest? It would seem
that some other form of testing would be needed to demonstrate that, aside from
playing a plethora of Chessmaster versions against one another. Maybe holding a
tournament between default Chessmaster and a number of other strong engines
(Fritz, Shredder, etc.), and then holding a second tournament between the
proposed "better" Chessmaster settings and the same set of strong engines (as
Kurt Utzinger is doing now).
This page took 0.01 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.