Author: Ed Schröder
Date: 10:48:34 10/24/04
Go up one level in this thread
On October 24, 2004 at 12:53:10, Mike Byrne wrote: >On October 24, 2004 at 07:28:56, Ed Schröder wrote: > >>I have put this article on my website for discusion and sharing information. >> >>http://members.home.nl/matador/testing.htm >> >>Ed >> >>==================== >> >>Adventures with Fritz >> >>This is an article about testing and some of the problems I encountered during >>engine-engine matches using FRITZ as base software. It's my understanding this >>article is a must-read for users who like to play these engine-engine matches >>with Pro Deo. This article also can be important to my colleague chess >>programmers because I don't know if these problems may also occur when testing >>their own engine. >> >>This article will be put on the CCC discussion board in the hope to create >>awareness, to receive useful comments, ask other testers and chess programmers >>either for confirmation or denial of the below listed problem. >> >>-------------------------------------------------------------------------------- >> >>Methodology >> >>Since 4-5 years I am using the eng-eng match technique as the final piece to >>test the changes I make. During the first 3-4 years the eng-eng testing was done >>under the REBEL DOS interface, but this testing was limited because it could >>only play against itself. The moment I had made my engine available to run under >>other interfaces I thought it would be an improvement to move to a different >>eng-eng testing environment that allowed me to test against more opponents. >> >>From the alternatives I choose for the FRITZ software mainly because of its >>user-friendly eng-eng match software. I created a set of 100 balanced opening >>positons and 4 fixed sparring engines (Fritz8, Shredder7, Junior8 and Hiars8) >>and let them play on 4 PC's at various levels, each producing 200 games, thus >>4x200 = 800 games in total. >> >>Testing is done without any learning activated, no opening books, same hash >>table size, same engine parameters, meaning: exclude all randomness that >>possibly may influence the progress of a game. Re-running the test should simply >>produce an equal result or something very close. >> >>This procdure was repeated several times to ensure its reliability and without >>any exception all of the replayed 800 game matches produced an acceptable error >>margin between -1% and +1%. It seems the system was working and I had created >>myself a reliable testing environment to test program changes, run the 800 game >>eng-eng match to see if it would produce a higher match score. So far so good. >> >>-------------------------------------------------------------------------------- >> >>Problems >> >>During time I noticed something odd, that the match results against Shredder7 >>and Junior8 went down considerable and on the other hand the match score against >>Hiarcs8 went up, also considerable, all of this as a pattern. This pattern >>remained so constant it made me suspicious and so I ran the initial match again >>and there it was, it produced a -3% match result, meaning a loss of 20 elo >>points for no good reason. My test environment was not reliable anymore, Houston >>there is a problem. >> >>I double-checked all the settings I was using that could explain this sudden >>fluctuation in score and found none, all the conditons were the same until I >>noticed something there had been an unimportant change after all, that at a >>certain moment I had set the main engine (the one that is loaded at program >>start) on all 4 PC's to FRITZ8. >> >>I couldn't believe this change could make any difference at all else it would >>mean 1 or 2 of the engines is not correctly loaded, meaning entering the world >>of bugs. I decided to find out nevertheless, after all I had no other clue than >>this. >> >>-------------------------------------------------------------------------------- >> >>The experiment >> >>I took an older version (Rebel 12.00.01) and ran 3 exact 4x200=800 games >>test-matches (time control 40/5) with the following exception: >> >>Match-1, FRITZ8 loaded at program start. >>Match-2, own engine loaded at program start (Shredder loaded with >> Shredder, Junior with Junior, etc.) >>Match-3, Pro Deo loaded at program start. >> >>It should produce match scores within an error margin of -1% or +1% else >>something serious is wrong with the testing technique itself which is either >>related to bugs or to the fact that 800 games is still not enough to ensure a >>-1% or +1% error margin. The results are telling and leave no room for >>speculation, there is something wrong with the testing environment. >> >> Match-1, FRITZ8 loaded at program start 38.1% >> Match-2, own engine loaded at program start 40.8% >> Match-3, Pro Deo loaded at program start. 42.8% >> >>An unbelievable and unacceptable difference of 4.7% which corresponds with an >>elo difference of more than 30 elo points depending on what engine is loaded at >>program start. >> >>-------------------------------------------------------------------------------- >> >>Where to go from here? >> >>It's tempting to advice users to have Pro Deo loaded at program start all the >>time (eng-eng and auto232) to ensure the best results but somehow this is an >>unsatisfactory thing to say, it's more constructive to start searching for the >>reasons behind and look for water-proofed solutions, hence I put this article on >>the CCC forum for discussion. An interesting information for me would be to >>receive the experiences of fellow programmers and testers, maybe things are >>entirely Pro Deo related after all. >> >>My conclusion so far is that I could not find any satisfactory explanation why >>Hiarcs8, Junior8 and Fritz8 match scores fluctuate so much. There is a possible >>reason for Shredder7, its settings are not correctly remembered, from time to >>time Shredder7 uses "position learning" after all, no matter the fact it is >>turned off. Other engines have this (settings) problem as well, for instance >>Chess Tiger 15 starts with the Gambit style as default setting, when you change >>it to Normal, exit and restart the program the Gambit style is active again. >> >>-------------------------------------------------------------------------------- >> >>More ChessBase oddities >> >>Other ChessBase oddities that are NOT related to this topic (which engine loaded >>at program start) but general hints for accurate testing, the below listed >>oddities are easily to overcome. >>This article is based on my experiences with the Fritz7 interface, the Fritz8 >>interface might be a different story. My preference for the Fritz7 interface is >>mainly because Fritz8 doesn't save Pro Deo's current personality right, Fritz7 >>does. >> >>There sometimes is a problem Fritz7 starts with the wrong Pro Deo personality. >>While the WB2UCI.ENG adaptor clearly states to use engine_X the Fritz7 interface >>ignores this and starts another engine. The problem occurs about 10% of the >>time. I have no idea if this problem still exist in the Fritz8 interface. The >>cure is to exit and restart Fritz7. So always check the param.txt file to see if >>a match is initialized well, see the Pro Deo FAQ for details. >You performed much greater detail investigative than I, but I have always >surmized engine vs engine matches under Chessbase/Fritz GUI was somehow flawed >when using the wb2uci adapter. I have got an interesting reply from Mathias Feist of ChessBase on the issue, I will ask his permission to post it here. Ed
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.