Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Adventures with Fritz

Author: Mike Byrne

Date: 09:53:10 10/24/04

Go up one level in this thread


On October 24, 2004 at 07:28:56, Ed Schröder wrote:

>I have put this article on my website for discusion and sharing information.
>
>http://members.home.nl/matador/testing.htm
>
>Ed
>
>====================
>
>Adventures with Fritz
>
>This is an article about testing and some of the problems I encountered during
>engine-engine matches using FRITZ as base software. It's my understanding this
>article is a must-read for users who like to play these engine-engine matches
>with Pro Deo. This article also can be important to my colleague chess
>programmers because I don't know if these problems may also occur when testing
>their own engine.
>
>This article will be put on the CCC discussion board in the hope to create
>awareness, to receive useful comments, ask other testers and chess programmers
>either for confirmation or denial of the below listed problem.
>
>--------------------------------------------------------------------------------
>
>Methodology
>
>Since 4-5 years I am using the eng-eng match technique as the final piece to
>test the changes I make. During the first 3-4 years the eng-eng testing was done
>under the REBEL DOS interface, but this testing was limited because it could
>only play against itself. The moment I had made my engine available to run under
>other interfaces I thought it would be an improvement to move to a different
>eng-eng testing environment that allowed me to test against more opponents.
>
>From the alternatives I choose for the FRITZ software mainly because of its
>user-friendly eng-eng match software. I created a set of 100 balanced opening
>positons and 4 fixed sparring engines (Fritz8, Shredder7, Junior8 and Hiars8)
>and let them play on 4 PC's at various levels, each producing 200 games, thus
>4x200 = 800 games in total.
>
>Testing is done without any learning activated, no opening books, same hash
>table size, same engine parameters, meaning: exclude all randomness that
>possibly may influence the progress of a game. Re-running the test should simply
>produce an equal result or something very close.
>
>This procdure was repeated several times to ensure its reliability and without
>any exception all of the replayed 800 game matches produced an acceptable error
>margin between -1% and +1%. It seems the system was working and I had created
>myself a reliable testing environment to test program changes, run the 800 game
>eng-eng match to see if it would produce a higher match score. So far so good.
>
>--------------------------------------------------------------------------------
>
>Problems
>
>During time I noticed something odd, that the match results against Shredder7
>and Junior8 went down considerable and on the other hand the match score against
>Hiarcs8 went up, also considerable, all of this as a pattern. This pattern
>remained so constant it made me suspicious and so I ran the initial match again
>and there it was, it produced a -3% match result, meaning a loss of 20 elo
>points for no good reason. My test environment was not reliable anymore, Houston
>there is a problem.
>
>I double-checked all the settings I was using that could explain this sudden
>fluctuation in score and found none, all the conditons were the same until I
>noticed something there had been an unimportant change after all, that at a
>certain moment I had set the main engine (the one that is loaded at program
>start) on all 4 PC's to FRITZ8.
>
>I couldn't believe this change could make any difference at all else it would
>mean 1 or 2 of the engines is not correctly loaded, meaning entering the world
>of bugs. I decided to find out nevertheless, after all I had no other clue than
>this.
>
>--------------------------------------------------------------------------------
>
>The experiment
>
>I took an older version (Rebel 12.00.01) and ran 3 exact 4x200=800 games
>test-matches (time control 40/5) with the following exception:
>
>Match-1, FRITZ8 loaded at program start.
>Match-2, own engine loaded at program start (Shredder loaded with
>         Shredder, Junior with Junior, etc.)
>Match-3, Pro Deo loaded at program start.
>
>It should produce match scores within an error margin of -1% or +1% else
>something serious is wrong with the testing technique itself which is either
>related to bugs or to the fact that 800 games is still not enough to ensure a
>-1% or +1% error margin. The results are telling and leave no room for
>speculation, there is something wrong with the testing environment.
>
>  Match-1, FRITZ8 loaded at program start            38.1%
>  Match-2, own engine loaded at program start        40.8%
>  Match-3, Pro Deo loaded at program start.          42.8%
>
>An unbelievable and unacceptable difference of 4.7% which corresponds with an
>elo difference of more than 30 elo points depending on what engine is loaded at
>program start.
>
>--------------------------------------------------------------------------------
>
>Where to go from here?
>
>It's tempting to advice users to have Pro Deo loaded at program start all the
>time (eng-eng and auto232) to ensure the best results but somehow this is an
>unsatisfactory thing to say, it's more constructive to start searching for the
>reasons behind and look for water-proofed solutions, hence I put this article on
>the CCC forum for discussion. An interesting information for me would be to
>receive the experiences of fellow programmers and testers, maybe things are
>entirely Pro Deo related after all.
>
>My conclusion so far is that I could not find any satisfactory explanation why
>Hiarcs8, Junior8 and Fritz8 match scores fluctuate so much. There is a possible
>reason for Shredder7, its settings are not correctly remembered, from time to
>time Shredder7 uses "position learning" after all, no matter the fact it is
>turned off. Other engines have this (settings) problem as well, for instance
>Chess Tiger 15 starts with the Gambit style as default setting, when you change
>it to Normal, exit and restart the program the Gambit style is active again.
>
>--------------------------------------------------------------------------------
>
>More ChessBase oddities
>
>Other ChessBase oddities that are NOT related to this topic (which engine loaded
>at program start) but general hints for accurate testing, the below listed
>oddities are easily to overcome.
>This article is based on my experiences with the Fritz7 interface, the Fritz8
>interface might be a different story. My preference for the Fritz7 interface is
>mainly because Fritz8 doesn't save Pro Deo's current personality right, Fritz7
>does.
>
>There sometimes is a problem Fritz7 starts with the wrong Pro Deo personality.
>While the WB2UCI.ENG adaptor clearly states to use engine_X the Fritz7 interface
>ignores this and starts another engine. The problem occurs about 10% of the
>time. I have no idea if this problem still exist in the Fritz8 interface. The
>cure is to exit and restart Fritz7. So always check the param.txt file to see if
>a match is initialized well, see the Pro Deo FAQ for details.

You performed much greater detail investigative than I, but I have always
surmized engine vs engine matches under Chessbase/Fritz GUI was somehow flawed
when using the wb2uci adapter.



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.