Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: WM Test Position 1 - Good Position or Proving Weakness of the Test!

Author: Robert Hyatt

Date: 08:32:03 06/12/04

Go up one level in this thread


On June 11, 2004 at 13:15:25, Ed Schröder wrote:

>On June 11, 2004 at 07:29:20, Rolf Tueschen wrote:
>
>>On June 11, 2004 at 02:14:32, Ed Schröder wrote:
>>
>>>On June 09, 2004 at 10:13:30, Franz Hagra wrote:
>>>
>>>>Te3 is not winning at all - its a draw (only in the original game black wins).
>>>>
>>>>Te3 is the key to draw the position, but its not essential to play it as first
>>>>move at all - so the test position is clear in logical human sence, but not
>>>>under test conditions, because the test only works correct, when only Te3 as
>>>>first move is found!
>>>>
>>>>Tad8 also leads to a draw position like Te3 - so the TEST POSITION is not
>>>>correct at all.
>>>
>>>[d]r3r1k1/1pq2pp1/2p2n2/1PNn4/2QN2b1/6P1/3RPP2/2R3KB b - -
>>>
>>>1..Re3 is a sound positional attacking move and according to my own brainchild
>>>there is a difference of 0.25 in score between 1..Re3 and 1..Rad8. The position
>>>IMO is a fine one to test the strategic insight of a chess program.
>>>
>>>My best,
>>>
>>>Ed
>>
>>Ed,
>>
>>you did NOT comment on the main finding Hagra has published here in
>>http://www.talkchess.com/forums/1/message.html?369557!
>>
>>I translate a second time into English:
>>
>>a) machine FRITZ 8 on AMD 1400 gets a solution time of 1 sec and that means
>>highest points for position no. 1 (which you gave thankfully above)
>>
>>b) machine FRITZ 8 on AMD 2800 gets a solution time of 480 sec!! So that it gets
>>way worse points in position no. 1!!
>>
>>Here is my verbal explanation (all found by Hagra):
>>
>>a stronger [!] machine on better hardware (do you accept that or do you claim
>>that AMD 1400 is STRONGER than AMD 2800?) is able to make a deeper [!!]
>>calculation and therefore finds the variation with first Rad8 - NOT as a final
>>solution, Ed! But as a variation, before it THEN comes back to Re3. Now, the
>>point is that such a behaviour is by far a sign for weaker strength but for
>>_better_ strength. But alas, Ed, the so called "WM-Test" of Dr. Mikhail Gurevich
>>gives to the weaker machine more points than for the stronger machine.
>
>This is common problem with positions which nature is positional. Take the start
>position for example, the moves 1.e4, 1.d4, 1.c4 and 1.Nf3 will produce scores
>that are very close to each other, in this case 4 moves with almost identical
>scores. What you see is that chess engines tend to switch from 1.e4 to 1.d4
>frequently and that the speed of the PC actually introduces a random element as
>in this case with Fritz.
>
>The problem is unavoidable, the only good way to deal with random effects is to
>increase the number of positional positions to be tested, at least to 100.
>
>The problem of randomness does not exist in clear tactical positions, there is
>only one good move, once the combination is found the key move will stay and the
>chess engine will never switch. But then what you are actually testing is SEARCH
>and not positional knowledge.


This shows that such tests are basically flawed.  The test should state "The
time to solution is the time where the engine chooses the right move, and then
sticks with it from that point forward, searching at least 30 minutes more..."

That stops this kind of nonsensical "faster = worse" problem.  Because as is,
the test simply is meaningless when changing nothing but the hardware results in
a poorer result...




>
>
>
>>Question of Hagra and also myself: is this a reasonable test design if a
>>stronger machine gets less points just because it looks deeper into the
>>position? As you know the time for all machines per position is 20 minutes. And
>>Gurevich defines the "stable holding of the once chosen move" [my verbal
>>interpretation] as the best way to test the *analytical ability* of a machine.
>>Do you now understand the contradiction in the test design of Dr. Mikhail
>>Gurevich, dear Ed? Higher abilities get a worse result! Is that sound? Hopefully
>>NOT.
>
>There is only one good way to test a chess engine, play games and a lot of them.
>Testsuites are fun but surrogate but they are popular because it is a quick way
>to estimate the strength of an engine.
>
>
>>Hope this clarifies the problem we faced with the German "WM-Test" in CSS.
>
>There are better ways, in the case of the WM-test I suggest the following
>change:
>
>1) Keep the current tactical positions, thinking time much lower, say 5-10
>minutes.
>
>2) Take 100-200 positional positions to deal a bit with randomness, thinking
>time much lower, say time 1-2 minutes, different rating formula skipping the
>time element, only criteria is if a move is found or not.
>
>My best,
>
>Ed



This page took 0.01 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.