Author: Dann Corbit
Date: 00:30:03 07/09/03
Go up one level in this thread
On July 09, 2003 at 03:19:09, Russell Reagan wrote: >In recent rec.games.chess.computer, a question was asked regarding what the >absolute strongest settings for Chessmaster were. The poster said he discovered >that the default settings were weaker by at least 97 ELO points and sited >Surak's rating list (http://www.grailmaster.com/misc/chess/comp/cm.html). > >Surak's rating list contains only ratings of different Chessmaster versions and >settings. Even with different settings, it's all the same engine, so I don't >imagine the rating differences could realistically be almost 100 ELO points (at >least not 100 points above what the author believes to be the strongest). > >To illistrate this point, I decided to play a little tournament between engines >that had similar ratings (still in progress). Here are the results, so far: > > Score 1 2 3 4 5 6 7 8 >-------------------------------------------------------------------------- > 1: Engine G 19.0 / 30 XXXXX ==10. 1===. ===1. ==11. 01==1 11==. 110== > 2: Engine D 16.5 / 30 ==01. XXXXX ====. ===== =11== =011. ==1=. 001=. > 3: Engine F 15.0 / 30 0===. ====. XXXXX 0==1. =1=== 1=00. 1==== ==10. > 4: Engine E 15.0 / 30 ===0. ===== 1==0. XXXXX =000. ===== ==11. ==11. > 5: Engine C 15.0 / 30 ==00. =00== =0=== =111. XXXXX 1===. ==10. 0=11. > 6: Engine B 14.0 / 30 10==0 =100. 0=11. ===== 0===. XXXXX =0=1. 01==. > 7: Engine H 13.0 / 30 00==. ==0=. 0==== ==00. ==01. =1=0. XXXXX =11== > 8: Engine A 12.5 / 30 001== 110=. ==01. ==00. 1=00. 10==. =00== XXXXX >-------------------------------------------------------------------------- >120 games: +30 =69 -21 > > Program Elo + - Games Score Av.Op. Draws > 1 Engine G : 2582 114 74 30 63.3 % 2487 53.3 % 2582 - 74 = 2510 > 2 Engine D : 2531 127 61 30 55.0 % 2496 63.3 % > 3 Engine C : 2501 90 90 30 50.0 % 2501 53.3 % > 4 Engine E : 2500 76 76 30 50.0 % 2500 66.7 % > 5 Engine F : 2499 76 76 30 50.0 % 2499 66.7 % > 6 Engine B : 2482 73 130 30 46.7 % 2505 53.3 % > 7 Engine H : 2457 65 124 30 43.3 % 2504 60.0 % > 8 Engine A : 2449 87 122 30 41.7 % 2508 43.3 % 2449 + 87 = 2536 > >This indicates a rating difference of 133 ELO points. The funny thing is, every >engine is Crafty, the exact same binary, using the exact same settings. If this >kind of testing can produce a difference of 133 rating points between the exact >same engine, what does that say about a mere 97 rating point difference between >the different Chessmaster settings? About what one would expect, depending upon the number of games that have been played. >This tells me that when testing an improvement to an engine, you shouldn't use >head to head results as a good indicator of whether or not the new version is >actaully an improvement. Thoughts? The worst possible opponent is one that is exactly your strength. There, the random walk effect is multiplied. The best possible opponents are a lot stronger or a lot weaker, but not dominatingly. So if you win 10% of the points in a long match or win 90% of the points in a long match, then you have a good indication. >Also, what does this say about the controversial issue surrounding whether or >not the default settings of Chessmaster are indeed the strongest? It would seem >that some other form of testing would be needed to demonstrate that, aside from >playing a plethora of Chessmaster versions against one another. Maybe holding a >tournament between default Chessmaster and a number of other strong engines >(Fritz, Shredder, etc.), and then holding a second tournament between the >proposed "better" Chessmaster settings and the same set of strong engines (as >Kurt Utzinger is doing now). I don't see a better way to find out than the contests that people are running. Once the error bars say that one cluster of settings is better, then we can believe it.
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.