Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: WM Test Position 1 - Good Position or Proving Weakness of the Test!

Author: Rolf Tueschen

Date: 11:55:38 06/11/04

Go up one level in this thread


On June 11, 2004 at 13:15:25, Ed Schröder wrote:

>On June 11, 2004 at 07:29:20, Rolf Tueschen wrote:
>
>>On June 11, 2004 at 02:14:32, Ed Schröder wrote:
>>
>>>On June 09, 2004 at 10:13:30, Franz Hagra wrote:
>>>
>>>>Te3 is not winning at all - its a draw (only in the original game black wins).
>>>>
>>>>Te3 is the key to draw the position, but its not essential to play it as first
>>>>move at all - so the test position is clear in logical human sence, but not
>>>>under test conditions, because the test only works correct, when only Te3 as
>>>>first move is found!
>>>>
>>>>Tad8 also leads to a draw position like Te3 - so the TEST POSITION is not
>>>>correct at all.
>>>
>>>[d]r3r1k1/1pq2pp1/2p2n2/1PNn4/2QN2b1/6P1/3RPP2/2R3KB b - -
>>>
>>>1..Re3 is a sound positional attacking move and according to my own brainchild
>>>there is a difference of 0.25 in score between 1..Re3 and 1..Rad8. The position
>>>IMO is a fine one to test the strategic insight of a chess program.
>>>
>>>My best,
>>>
>>>Ed
>>
>>Ed,
>>
>>you did NOT comment on the main finding Hagra has published here in
>>http://www.talkchess.com/forums/1/message.html?369557!
>>
>>I translate a second time into English:
>>
>>a) machine FRITZ 8 on AMD 1400 gets a solution time of 1 sec and that means
>>highest points for position no. 1 (which you gave thankfully above)
>>
>>b) machine FRITZ 8 on AMD 2800 gets a solution time of 480 sec!! So that it gets
>>way worse points in position no. 1!!
>>
>>Here is my verbal explanation (all found by Hagra):
>>
>>a stronger [!] machine on better hardware (do you accept that or do you claim
>>that AMD 1400 is STRONGER than AMD 2800?) is able to make a deeper [!!]
>>calculation and therefore finds the variation with first Rad8 - NOT as a final
>>solution, Ed! But as a variation, before it THEN comes back to Re3. Now, the
>>point is that such a behaviour is by far a sign for weaker strength but for
>>_better_ strength. But alas, Ed, the so called "WM-Test" of Dr. Mikhail Gurevich
>>gives to the weaker machine more points than for the stronger machine.
>
>This is common problem with positions which nature is positional. Take the start
>position for example, the moves 1.e4, 1.d4, 1.c4 and 1.Nf3 will produce scores
>that are very close to each other, in this case 4 moves with almost identical
>scores. What you see is that chess engines tend to switch from 1.e4 to 1.d4
>frequently and that the speed of the PC actually introduces a random element as
>in this case with Fritz.


This is still not what is happening in the case of Fritz on different hardware
and WM-Test position number 1.

a) at first Fritz 8 on AMD 2800 goes directly for Re3

b) then at a depth of 12 it changes to Rad8

c) later it comes back to Re3 (therefore it gets 480 sec as result)

However the same engine on weaker hardware AMD 1400 does never change to Rad8
because it can't go deep as 12. Therefore here Fritz 8 gets the result 1 sec
which is the best possible. For this apparent contradiction you must find a
reasonable solution in your test suite. At the end below you make a good
proposal.


>
>The problem is unavoidable, the only good way to deal with random effects is to
>increase the number of positional positions to be tested, at least to 100.


It's not random, it's an effect of depth!




>
>The problem of randomness does not exist in clear tactical positions, there is
>only one good move, once the combination is found the key move will stay and the
>chess engine will never switch. But then what you are actually testing is SEARCH
>and not positional knowledge.


Thanks for the lesson. Ed, how do you think about the intention of M. Gurevich
who wants to test the "analytical ability" with his "WM-Test"?



>
>
>
>>Question of Hagra and also myself: is this a reasonable test design if a
>>stronger machine gets less points just because it looks deeper into the
>>position? As you know the time for all machines per position is 20 minutes. And
>>Gurevich defines the "stable holding of the once chosen move" [my verbal
>>interpretation] as the best way to test the *analytical ability* of a machine.
>>Do you now understand the contradiction in the test design of Dr. Mikhail
>>Gurevich, dear Ed? Higher abilities get a worse result! Is that sound? Hopefully
>>NOT.
>
>There is only one good way to test a chess engine, play games and a lot of them.
>Testsuites are fun but surrogate but they are popular because it is a quick way
>to estimate the strength of an engine.


Goodness me!!! I always thought it but I never dared to write it. :)




>
>
>>Hope this clarifies the problem we faced with the German "WM-Test" in CSS.
>
>There are better ways, in the case of the WM-test I suggest the following
>change:
>
>1) Keep the current tactical positions, thinking time much lower, say 5-10
>minutes.
>
>2) Take 100-200 positional positions to deal a bit with randomness, thinking
>time much lower, say time 1-2 minutes, different rating formula skipping the
>time element, only criteria is if a move is found or not.

That's an interesting proposal and I wished that your collegues in Germany would
read you. I'm really happy that the debate ended with such a productive
solution.

Thanks, Ed.


>
>My best,
>
>Ed



This page took 0.02 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.