Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: 50 Test Positions, 15 Engines - Results, Comparisons

Author: Uri Blass

Date: 00:17:43 12/13/02

Go up one level in this thread


On December 13, 2002 at 01:08:26, Christopher A. Morgan wrote:

>
>50 Test Positions, 15 Engines - Results, Comparisons
>
>Recently, Gian-Carlo Pascutto, I think it was, posted about 150 position
>problems, together with solutions, each with an identifying number of
>ECM.xxx(x). I took the first 51, discarded one, and ran time to solve tests on
>15 different chess engines for each of the 50 positions in the Fritz 7 GUI Tools
>–> Analysis –> Process Test Set  window.  Below are the results of the tests.
>The problems in FEN, as previously posted, follow the results.
>
>Some details: For each of the problems I confirmed the solution by letting a
>couple of engines run individually in infinite analysis mode for 5-10 minutes.
>The one problem discarded had three different solutions by four different
>engines.  For four of the problems, numbers 14, 28, 29 and 38 I listed the text
>move as the solution together with a variation.  In one case Nimzo 8 was only
>engine of four I tested with that came up with the text solution.  The other
>three engines agreed on a different solution which became the variation.  For
>the other three positions, multiple engines agreed on a different solution which
>became the variation.
>
>My goal was to have the majority of the engines solve every problem, so it would
>be a test of how quickly a particular engine solved a problem compared to all
>other engines in average speed of finding solutions to all problems, rather than
>running for ten minutes (maximum allowed time per position) and not finding a
>solution.
>
>Hardware: Athlon 750, 384MB RAM, 144MB RAM hashtables, except for Chess Tiger
>14, and Gambit Tiger 2 which, apparently, only allow a maximum of a 96MB
>hashtable.  The times given should only be looked at in relative terms, that is
>relative to the other engines.  Faster processors will get much faster times,
>but I would expect that the relative percentage differences in average speed
>should remain constant among different processors running the same problems with
>the given solutions.
>
>Problems 1, 24, 31, 33, and 41 took the most time for most engines, and a few
>were not able to find a solution for some of these in the maximum ten minutes
>allowed.

I am surprised because 1 is one of the easiest problems in the test suite and
movei finds it in less than one second.

1 can be also solved for the wrong reasons(movei likes it at depth 2-4 only to
change it's mind at depth 5 and to change it's mind again at depth 6 and I
remember that one of the weakest engines showed Bg4 from the first ply)

There is no doubt that movei can see clear tactics at depth 7 and the score is
+3 after only 1.38 seconds on p850.

The main problem with 1 is that d4 is another way to win but I expect all
engines to see a better score for Bg4.

24 is not very easy like 1 but it is also not hard to solve.

103 seconds and depth 10 on p850 is enough to see fail high on Nxh7.

The hard problem is 28 and I believe that programs usually solve it for
positional reasons.

31 was really impossible to solve some time ago for movei (not today but I still
consider it as an hard problem).

33 is easy to solve if the program evaluates balck as better because it is easy
to see that Ba3 force a draw.

seeing that white can win is an harder task.


  I used the default of looking forward one additional ply after a
>solution was found. Times to solve problems varied tremendously by problem, and
>by engine. The fastest engine overall, Nimzo 8, was easily beaten in some
>problems, for instance.  Some examples: Problem 31, Gambit Tiger 2 - 30 seconds
>11 ply, Nimzo 8 - 144 seconds 12 ply, and Hiarcs 8 - 353 seconds 11 ply; Problem
>41, Nimzo 8 - 22 secs. 9 ply, Ruffian 1.0.1 - 112 seconds 11 ply, and Chess
>Tiger 14 - 384 seconds 14 ply.

The question is if nimzo solve it for the right reasons.

a program may see a draw score for Nxe7 and solve 41 if it evaluate other moves
as worse.

Another program with the same tactical strength may not solve 41 because it can
see positive score for Bxb5.

Uri



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.