Computer Chess Club Archives


Search

Terms

Messages

Subject: Testing who is better, rating lists, and Chessmaster

Author: Russell Reagan

Date: 00:19:09 07/09/03


In recent rec.games.chess.computer, a question was asked regarding what the
absolute strongest settings for Chessmaster were. The poster said he discovered
that the default settings were weaker by at least 97 ELO points and sited
Surak's rating list (http://www.grailmaster.com/misc/chess/comp/cm.html).

Surak's rating list contains only ratings of different Chessmaster versions and
settings. Even with different settings, it's all the same engine, so I don't
imagine the rating differences could realistically be almost 100 ELO points (at
least not 100 points above what the author believes to be the strongest).

To illistrate this point, I decided to play a little tournament between engines
that had similar ratings (still in progress). Here are the results, so far:

                Score         1     2     3     4     5     6     7     8
--------------------------------------------------------------------------
 1: Engine G  19.0 / 30   XXXXX ==10. 1===. ===1. ==11. 01==1 11==. 110==
 2: Engine D  16.5 / 30   ==01. XXXXX ====. ===== =11== =011. ==1=. 001=.
 3: Engine F  15.0 / 30   0===. ====. XXXXX 0==1. =1=== 1=00. 1==== ==10.
 4: Engine E  15.0 / 30   ===0. ===== 1==0. XXXXX =000. ===== ==11. ==11.
 5: Engine C  15.0 / 30   ==00. =00== =0=== =111. XXXXX 1===. ==10. 0=11.
 6: Engine B  14.0 / 30   10==0 =100. 0=11. ===== 0===. XXXXX =0=1. 01==.
 7: Engine H  13.0 / 30   00==. ==0=. 0==== ==00. ==01. =1=0. XXXXX =11==
 8: Engine A  12.5 / 30   001== 110=. ==01. ==00. 1=00. 10==. =00== XXXXX
--------------------------------------------------------------------------
120 games: +30 =69 -21

    Program       Elo    +   -   Games   Score   Av.Op.  Draws
  1 Engine G    : 2582  114  74    30    63.3 %   2487   53.3 %
  2 Engine D    : 2531  127  61    30    55.0 %   2496   63.3 %
  3 Engine C    : 2501   90  90    30    50.0 %   2501   53.3 %
  4 Engine E    : 2500   76  76    30    50.0 %   2500   66.7 %
  5 Engine F    : 2499   76  76    30    50.0 %   2499   66.7 %
  6 Engine B    : 2482   73 130    30    46.7 %   2505   53.3 %
  7 Engine H    : 2457   65 124    30    43.3 %   2504   60.0 %
  8 Engine A    : 2449   87 122    30    41.7 %   2508   43.3 %

This indicates a rating difference of 133 ELO points. The funny thing is, every
engine is Crafty, the exact same binary, using the exact same settings. If this
kind of testing can produce a difference of 133 rating points between the exact
same engine, what does that say about a mere 97 rating point difference between
the different Chessmaster settings?

This tells me that when testing an improvement to an engine, you shouldn't use
head to head results as a good indicator of whether or not the new version is
actaully an improvement. Thoughts?

Also, what does this say about the controversial issue surrounding whether or
not the default settings of Chessmaster are indeed the strongest? It would seem
that some other form of testing would be needed to demonstrate that, aside from
playing a plethora of Chessmaster versions against one another. Maybe holding a
tournament between default Chessmaster and a number of other strong engines
(Fritz, Shredder, etc.), and then holding a second tournament between the
proposed "better" Chessmaster settings and the same set of strong engines (as
Kurt Utzinger is doing now).



This page took 0.01 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.