Computer Chess Club Archives

Search

Terms
Messages

Subject: Re: Interpreting results from tests suites. Advice needed.

Author: Miguel A. Ballicora
Date: 20:11:34 06/08/02
On June 08, 2002 at 12:54:55, José Carlos wrote:

>  I'm testing an aggresive prunning method in my program. I believe best testing
>comes from real games, but I was curious how this change would affect tests
>suites results. So I tried the test from GPC site with the two versions, with
>and without aggresive prunning. I used my old PII 400 and 30 seconds for
>position.
>  I have no experience with these epd tests and the result confuses me. It seems
>to randomly depend on the used time that a version does better or worse. Here
>are the results (by EPD2WB):
>
>Normal version:           New version:
>
> Sec Solved Total         Sec Solved Total
>---- ------ ------       ---- ------ ------
>   1      6      6 +        1      2      2 -
>   2      1      7 +        2      2      4 -
>   3      0      7 =        3      3      7 =
>   4      7     14 -        4     11     18 +
>   5      6     20 -        5      5     23 +
>   6      8     28 -        6      6     29 +
>   7      3     31 -        7      5     34 +
>   8      4     35 =        8      1     35 =
>   9      2     37 +        9      1     36 -
>  10      1     38 -       10      4     40 +
>  11      2     40 =       11      0     40 =
>  12      3     43 =       12      3     43 =
>  13      2     45 =       13      2     45 =
>  14      0     45 -       14      1     46 +
>  15      1     46 -       15      2     48 +
>  16      6     52 +       16      1     49 -
>  17      1     53 +       17      2     51 -
>  18      3     56 +       18      2     53 -
>  19      1     57 +       19      0     53 -
>  20      1     58 +       20      3     56 -
>  21      1     59 +       21      2     58 -
>  22      0     59 =       22      1     59 =
>  23      1     60 +       23      0     59 -
>  24      1     61 -       24      3     62 +
>  25      1     62 -       25      1     63 +
>  26      3     65 +       26      1     64 -
>  27      3     68 +       27      2     66 -
>  28      1     69 +       28      0     66 -
>  29      1     70 +       29      0     66 -
>  30      1     70 +       30      1     67 -
>
>  I've put a (+) when one version does better at n seconds, a (=) for equal
>performance and a (-) for worse.
>  So what's your experience in this matter? How would you interpret the results?
>
>  Thanks in advance,
>
>  José C.

This is what I believe is best to compare the results of a testsuite.
1) Let certain number of positions to run until they are ALL solved and make
sure that it won't change its mind.
2) for each position, make the ratio of the solution times.
3) take the log2 of that ratio (logarithm of base 2)
4) multiply by 70
That number is the "estimation" of how stronger is the new version in a scale
that resembles elo points. Of course, inaccurate because it is one position.
So...
5) do the same for all positions and average them.

I hope that the rationale of this procedure is obvious, otherwise I will explain
in more detail.

Regards,
Miguel
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.