Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Adventures with Fritz

Author: Ed Schröder

Date: 10:48:34 10/24/04

Go up one level in this thread


On October 24, 2004 at 12:53:10, Mike Byrne wrote:

>On October 24, 2004 at 07:28:56, Ed Schröder wrote:
>
>>I have put this article on my website for discusion and sharing information.
>>
>>http://members.home.nl/matador/testing.htm
>>
>>Ed
>>
>>====================
>>
>>Adventures with Fritz
>>
>>This is an article about testing and some of the problems I encountered during
>>engine-engine matches using FRITZ as base software. It's my understanding this
>>article is a must-read for users who like to play these engine-engine matches
>>with Pro Deo. This article also can be important to my colleague chess
>>programmers because I don't know if these problems may also occur when testing
>>their own engine.
>>
>>This article will be put on the CCC discussion board in the hope to create
>>awareness, to receive useful comments, ask other testers and chess programmers
>>either for confirmation or denial of the below listed problem.
>>
>>--------------------------------------------------------------------------------
>>
>>Methodology
>>
>>Since 4-5 years I am using the eng-eng match technique as the final piece to
>>test the changes I make. During the first 3-4 years the eng-eng testing was done
>>under the REBEL DOS interface, but this testing was limited because it could
>>only play against itself. The moment I had made my engine available to run under
>>other interfaces I thought it would be an improvement to move to a different
>>eng-eng testing environment that allowed me to test against more opponents.
>>
>>From the alternatives I choose for the FRITZ software mainly because of its
>>user-friendly eng-eng match software. I created a set of 100 balanced opening
>>positons and 4 fixed sparring engines (Fritz8, Shredder7, Junior8 and Hiars8)
>>and let them play on 4 PC's at various levels, each producing 200 games, thus
>>4x200 = 800 games in total.
>>
>>Testing is done without any learning activated, no opening books, same hash
>>table size, same engine parameters, meaning: exclude all randomness that
>>possibly may influence the progress of a game. Re-running the test should simply
>>produce an equal result or something very close.
>>
>>This procdure was repeated several times to ensure its reliability and without
>>any exception all of the replayed 800 game matches produced an acceptable error
>>margin between -1% and +1%. It seems the system was working and I had created
>>myself a reliable testing environment to test program changes, run the 800 game
>>eng-eng match to see if it would produce a higher match score. So far so good.
>>
>>--------------------------------------------------------------------------------
>>
>>Problems
>>
>>During time I noticed something odd, that the match results against Shredder7
>>and Junior8 went down considerable and on the other hand the match score against
>>Hiarcs8 went up, also considerable, all of this as a pattern. This pattern
>>remained so constant it made me suspicious and so I ran the initial match again
>>and there it was, it produced a -3% match result, meaning a loss of 20 elo
>>points for no good reason. My test environment was not reliable anymore, Houston
>>there is a problem.
>>
>>I double-checked all the settings I was using that could explain this sudden
>>fluctuation in score and found none, all the conditons were the same until I
>>noticed something there had been an unimportant change after all, that at a
>>certain moment I had set the main engine (the one that is loaded at program
>>start) on all 4 PC's to FRITZ8.
>>
>>I couldn't believe this change could make any difference at all else it would
>>mean 1 or 2 of the engines is not correctly loaded, meaning entering the world
>>of bugs. I decided to find out nevertheless, after all I had no other clue than
>>this.
>>
>>--------------------------------------------------------------------------------
>>
>>The experiment
>>
>>I took an older version (Rebel 12.00.01) and ran 3 exact 4x200=800 games
>>test-matches (time control 40/5) with the following exception:
>>
>>Match-1, FRITZ8 loaded at program start.
>>Match-2, own engine loaded at program start (Shredder loaded with
>>         Shredder, Junior with Junior, etc.)
>>Match-3, Pro Deo loaded at program start.
>>
>>It should produce match scores within an error margin of -1% or +1% else
>>something serious is wrong with the testing technique itself which is either
>>related to bugs or to the fact that 800 games is still not enough to ensure a
>>-1% or +1% error margin. The results are telling and leave no room for
>>speculation, there is something wrong with the testing environment.
>>
>>  Match-1, FRITZ8 loaded at program start            38.1%
>>  Match-2, own engine loaded at program start        40.8%
>>  Match-3, Pro Deo loaded at program start.          42.8%
>>
>>An unbelievable and unacceptable difference of 4.7% which corresponds with an
>>elo difference of more than 30 elo points depending on what engine is loaded at
>>program start.
>>
>>--------------------------------------------------------------------------------
>>
>>Where to go from here?
>>
>>It's tempting to advice users to have Pro Deo loaded at program start all the
>>time (eng-eng and auto232) to ensure the best results but somehow this is an
>>unsatisfactory thing to say, it's more constructive to start searching for the
>>reasons behind and look for water-proofed solutions, hence I put this article on
>>the CCC forum for discussion. An interesting information for me would be to
>>receive the experiences of fellow programmers and testers, maybe things are
>>entirely Pro Deo related after all.
>>
>>My conclusion so far is that I could not find any satisfactory explanation why
>>Hiarcs8, Junior8 and Fritz8 match scores fluctuate so much. There is a possible
>>reason for Shredder7, its settings are not correctly remembered, from time to
>>time Shredder7 uses "position learning" after all, no matter the fact it is
>>turned off. Other engines have this (settings) problem as well, for instance
>>Chess Tiger 15 starts with the Gambit style as default setting, when you change
>>it to Normal, exit and restart the program the Gambit style is active again.
>>
>>--------------------------------------------------------------------------------
>>
>>More ChessBase oddities
>>
>>Other ChessBase oddities that are NOT related to this topic (which engine loaded
>>at program start) but general hints for accurate testing, the below listed
>>oddities are easily to overcome.
>>This article is based on my experiences with the Fritz7 interface, the Fritz8
>>interface might be a different story. My preference for the Fritz7 interface is
>>mainly because Fritz8 doesn't save Pro Deo's current personality right, Fritz7
>>does.
>>
>>There sometimes is a problem Fritz7 starts with the wrong Pro Deo personality.
>>While the WB2UCI.ENG adaptor clearly states to use engine_X the Fritz7 interface
>>ignores this and starts another engine. The problem occurs about 10% of the
>>time. I have no idea if this problem still exist in the Fritz8 interface. The
>>cure is to exit and restart Fritz7. So always check the param.txt file to see if
>>a match is initialized well, see the Pro Deo FAQ for details.

>You performed much greater detail investigative than I, but I have always
>surmized engine vs engine matches under Chessbase/Fritz GUI was somehow flawed
>when using the wb2uci adapter.

I have got an interesting reply from Mathias Feist of ChessBase on the issue, I
will ask his permission to post it here.

Ed



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.