Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Looking for clean test suites any ideas.

Author: Charles Roberson

Date: 07:28:47 03/04/06

Go up one level in this thread



 Since you haven't had a response, I'll respond.

 Proabably the best answer is do it yourself.
 Easily done or hard to do: take your pick.
   1) grab several suites and run several engines (32 should do) through it.
   2) the engines should be quite varied in strength (very weak to very strong)
   3) Now the easy or hard part:
       a) easy: compare the results to a various rating lists.
                (WBEC, Gunther Simon's, George Lyapko's, Oliver Deville....)
             each has various time controls (so they did the hard part for you)
       b) hard: if you don't do (a) then do the work of (a) by running
           lots of tournaments with various time controls using the programs
           that you are testing and more if you can.

  If you go with steps 1,2, and 3a, you should be able to do the work reasonably
quick. Make sure you a decent mathematical method for correlation testing of
your results. You could look for several correlations:
    1) Correlation between rankings: ratings list order versus order of suite
           performance.
    2) Rating Correlation: is there a solid correlation between exact rating
         and number of problems solved.

      There are others, but remember that you should run the correlation tests
      with each different time control.

    This is good fundamental research that you will be doing. Likely answer will
be that most test suites are not good indicators. Interesting answer is that a
test suite exists that does correlate to tournament performance.

  If you don't know how to do correlation tests, you can find the equations
   in an elementary statistics book. Pearson's R should be a good coefficient
   to use.

    Maybe you'll find that only the weak engines correlate with the data or
    only the middle engines or only the top ones.

   This seems problematic -- engines can have low ratings due to bad search,
   bad position evaluator, any combination of that. Thus, a positional suite
   versus a tactical suite could produce different results. The best suite
   should have a combination of positional and tactical. So you could make up
   test suites. Use them by themselves and make up new suites based on
combinations of other suites.

   Theoritically, it should be possible to put a suite together that predicts
   performance. Because, most of us test performance via running lots of
matches. This is effectively a suite, just a very dynamic one.

   Effectively, this is remotly similar to the idea of position learning.
   Position learning will learn a position by storing it in a database when
   the program produces the incorrect eval for the position in a real match.
   The next time the engine sees the position in a search it uses the
   experience value in the database instead of the search value. If you create
   a suite of all the positions in the position learned database and improve
   the program so that it does better on that suite of troublesome positions
   without degredating performance in other areas then the program has improved.



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.