Computer Chess Club Archives

Search

Terms
Messages

Subject: Re: WM Test Position 1 - Good Position or Proving Weakness of the Test!

Author: Ed Schröder
Date: 15:23:57 06/11/04
On June 11, 2004 at 14:55:38, Rolf Tueschen wrote:

>On June 11, 2004 at 13:15:25, Ed Schröder wrote:
>
>>On June 11, 2004 at 07:29:20, Rolf Tueschen wrote:
>>
>>>On June 11, 2004 at 02:14:32, Ed Schröder wrote:
>>>
>>>>On June 09, 2004 at 10:13:30, Franz Hagra wrote:
>>>>
>>>>>Te3 is not winning at all - its a draw (only in the original game black wins).
>>>>>
>>>>>Te3 is the key to draw the position, but its not essential to play it as first
>>>>>move at all - so the test position is clear in logical human sence, but not
>>>>>under test conditions, because the test only works correct, when only Te3 as
>>>>>first move is found!
>>>>>
>>>>>Tad8 also leads to a draw position like Te3 - so the TEST POSITION is not
>>>>>correct at all.
>>>>
>>>>[d]r3r1k1/1pq2pp1/2p2n2/1PNn4/2QN2b1/6P1/3RPP2/2R3KB b - -
>>>>
>>>>1..Re3 is a sound positional attacking move and according to my own brainchild
>>>>there is a difference of 0.25 in score between 1..Re3 and 1..Rad8. The position
>>>>IMO is a fine one to test the strategic insight of a chess program.
>>>>
>>>>My best,
>>>>
>>>>Ed
>>>
>>>Ed,
>>>
>>>you did NOT comment on the main finding Hagra has published here in
>>>http://www.talkchess.com/forums/1/message.html?369557!
>>>
>>>I translate a second time into English:
>>>
>>>a) machine FRITZ 8 on AMD 1400 gets a solution time of 1 sec and that means
>>>highest points for position no. 1 (which you gave thankfully above)
>>>
>>>b) machine FRITZ 8 on AMD 2800 gets a solution time of 480 sec!! So that it gets
>>>way worse points in position no. 1!!
>>>
>>>Here is my verbal explanation (all found by Hagra):
>>>
>>>a stronger [!] machine on better hardware (do you accept that or do you claim
>>>that AMD 1400 is STRONGER than AMD 2800?) is able to make a deeper [!!]
>>>calculation and therefore finds the variation with first Rad8 - NOT as a final
>>>solution, Ed! But as a variation, before it THEN comes back to Re3. Now, the
>>>point is that such a behaviour is by far a sign for weaker strength but for
>>>_better_ strength. But alas, Ed, the so called "WM-Test" of Dr. Mikhail Gurevich
>>>gives to the weaker machine more points than for the stronger machine.
>>
>>This is common problem with positions which nature is positional. Take the start
>>position for example, the moves 1.e4, 1.d4, 1.c4 and 1.Nf3 will produce scores
>>that are very close to each other, in this case 4 moves with almost identical
>>scores. What you see is that chess engines tend to switch from 1.e4 to 1.d4
>>frequently and that the speed of the PC actually introduces a random element as
>>in this case with Fritz.
>
>
>This is still not what is happening in the case of Fritz on different hardware
>and WM-Test position number 1.
>
>a) at first Fritz 8 on AMD 2800 goes directly for Re3
>
>b) then at a depth of 12 it changes to Rad8
>
>c) later it comes back to Re3 (therefore it gets 480 sec as result)
>
>However the same engine on weaker hardware AMD 1400 does never change to Rad8
>because it can't go deep as 12. Therefore here Fritz 8 gets the result 1 sec
>which is the best possible. For this apparent contradiction you must find a
>reasonable solution in your test suite. At the end below you make a good
>proposal.

That's what I said, the (faster) hardware adds a random factor.



>>The problem is unavoidable, the only good way to deal with random effects is to
>>increase the number of positional positions to be tested, at least to 100.


>It's not random, it's an effect of depth!

Thus the faster hardware.

To deal with randomness one flees to volume, like polls.




>>The problem of randomness does not exist in clear tactical positions, there is
>>only one good move, once the combination is found the key move will stay and the
>>chess engine will never switch. But then what you are actually testing is SEARCH
>>and not positional knowledge.


>Thanks for the lesson. Ed, how do you think about the intention of M. Gurevich
>who wants to test the "analytical ability" with his "WM-Test"?

I don't think it is possible to create a testsuite that measures a true elo
rating for a chess engine. If it were possible it would already exist, if
someone manages to creates one he/she should sell it, I am ready to pay good
money for it.

My best,

Ed



>>>Question of Hagra and also myself: is this a reasonable test design if a
>>>stronger machine gets less points just because it looks deeper into the
>>>position? As you know the time for all machines per position is 20 minutes. And
>>>Gurevich defines the "stable holding of the once chosen move" [my verbal
>>>interpretation] as the best way to test the *analytical ability* of a machine.
>>>Do you now understand the contradiction in the test design of Dr. Mikhail
>>>Gurevich, dear Ed? Higher abilities get a worse result! Is that sound? Hopefully
>>>NOT.
>>
>>There is only one good way to test a chess engine, play games and a lot of them.
>>Testsuites are fun but surrogate but they are popular because it is a quick way
>>to estimate the strength of an engine.
>
>
>Goodness me!!! I always thought it but I never dared to write it. :)
>
>
>
>
>>
>>
>>>Hope this clarifies the problem we faced with the German "WM-Test" in CSS.
>>
>>There are better ways, in the case of the WM-test I suggest the following
>>change:
>>
>>1) Keep the current tactical positions, thinking time much lower, say 5-10
>>minutes.
>>
>>2) Take 100-200 positional positions to deal a bit with randomness, thinking
>>time much lower, say time 1-2 minutes, different rating formula skipping the
>>time element, only criteria is if a move is found or not.
>
>That's an interesting proposal and I wished that your collegues in Germany would
>read you. I'm really happy that the debate ended with such a productive
>solution.
>
>Thanks, Ed.
>
>
>>
>>My best,
>>
>>Ed
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.