Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: WM Test Position 1 - Good Position or Proving Weakness of the Test!

Author: Ed Schröder

Date: 10:15:25 06/11/04

Go up one level in this thread


On June 11, 2004 at 07:29:20, Rolf Tueschen wrote:

>On June 11, 2004 at 02:14:32, Ed Schröder wrote:
>
>>On June 09, 2004 at 10:13:30, Franz Hagra wrote:
>>
>>>Te3 is not winning at all - its a draw (only in the original game black wins).
>>>
>>>Te3 is the key to draw the position, but its not essential to play it as first
>>>move at all - so the test position is clear in logical human sence, but not
>>>under test conditions, because the test only works correct, when only Te3 as
>>>first move is found!
>>>
>>>Tad8 also leads to a draw position like Te3 - so the TEST POSITION is not
>>>correct at all.
>>
>>[d]r3r1k1/1pq2pp1/2p2n2/1PNn4/2QN2b1/6P1/3RPP2/2R3KB b - -
>>
>>1..Re3 is a sound positional attacking move and according to my own brainchild
>>there is a difference of 0.25 in score between 1..Re3 and 1..Rad8. The position
>>IMO is a fine one to test the strategic insight of a chess program.
>>
>>My best,
>>
>>Ed
>
>Ed,
>
>you did NOT comment on the main finding Hagra has published here in
>http://www.talkchess.com/forums/1/message.html?369557!
>
>I translate a second time into English:
>
>a) machine FRITZ 8 on AMD 1400 gets a solution time of 1 sec and that means
>highest points for position no. 1 (which you gave thankfully above)
>
>b) machine FRITZ 8 on AMD 2800 gets a solution time of 480 sec!! So that it gets
>way worse points in position no. 1!!
>
>Here is my verbal explanation (all found by Hagra):
>
>a stronger [!] machine on better hardware (do you accept that or do you claim
>that AMD 1400 is STRONGER than AMD 2800?) is able to make a deeper [!!]
>calculation and therefore finds the variation with first Rad8 - NOT as a final
>solution, Ed! But as a variation, before it THEN comes back to Re3. Now, the
>point is that such a behaviour is by far a sign for weaker strength but for
>_better_ strength. But alas, Ed, the so called "WM-Test" of Dr. Mikhail Gurevich
>gives to the weaker machine more points than for the stronger machine.

This is common problem with positions which nature is positional. Take the start
position for example, the moves 1.e4, 1.d4, 1.c4 and 1.Nf3 will produce scores
that are very close to each other, in this case 4 moves with almost identical
scores. What you see is that chess engines tend to switch from 1.e4 to 1.d4
frequently and that the speed of the PC actually introduces a random element as
in this case with Fritz.

The problem is unavoidable, the only good way to deal with random effects is to
increase the number of positional positions to be tested, at least to 100.

The problem of randomness does not exist in clear tactical positions, there is
only one good move, once the combination is found the key move will stay and the
chess engine will never switch. But then what you are actually testing is SEARCH
and not positional knowledge.



>Question of Hagra and also myself: is this a reasonable test design if a
>stronger machine gets less points just because it looks deeper into the
>position? As you know the time for all machines per position is 20 minutes. And
>Gurevich defines the "stable holding of the once chosen move" [my verbal
>interpretation] as the best way to test the *analytical ability* of a machine.
>Do you now understand the contradiction in the test design of Dr. Mikhail
>Gurevich, dear Ed? Higher abilities get a worse result! Is that sound? Hopefully
>NOT.

There is only one good way to test a chess engine, play games and a lot of them.
Testsuites are fun but surrogate but they are popular because it is a quick way
to estimate the strength of an engine.


>Hope this clarifies the problem we faced with the German "WM-Test" in CSS.

There are better ways, in the case of the WM-test I suggest the following
change:

1) Keep the current tactical positions, thinking time much lower, say 5-10
minutes.

2) Take 100-200 positional positions to deal a bit with randomness, thinking
time much lower, say time 1-2 minutes, different rating formula skipping the
time element, only criteria is if a move is found or not.

My best,

Ed



This page took 0.02 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.