Author: Ed Schröder
Date: 15:23:57 06/11/04
Go up one level in this thread
On June 11, 2004 at 14:55:38, Rolf Tueschen wrote: >On June 11, 2004 at 13:15:25, Ed Schröder wrote: > >>On June 11, 2004 at 07:29:20, Rolf Tueschen wrote: >> >>>On June 11, 2004 at 02:14:32, Ed Schröder wrote: >>> >>>>On June 09, 2004 at 10:13:30, Franz Hagra wrote: >>>> >>>>>Te3 is not winning at all - its a draw (only in the original game black wins). >>>>> >>>>>Te3 is the key to draw the position, but its not essential to play it as first >>>>>move at all - so the test position is clear in logical human sence, but not >>>>>under test conditions, because the test only works correct, when only Te3 as >>>>>first move is found! >>>>> >>>>>Tad8 also leads to a draw position like Te3 - so the TEST POSITION is not >>>>>correct at all. >>>> >>>>[d]r3r1k1/1pq2pp1/2p2n2/1PNn4/2QN2b1/6P1/3RPP2/2R3KB b - - >>>> >>>>1..Re3 is a sound positional attacking move and according to my own brainchild >>>>there is a difference of 0.25 in score between 1..Re3 and 1..Rad8. The position >>>>IMO is a fine one to test the strategic insight of a chess program. >>>> >>>>My best, >>>> >>>>Ed >>> >>>Ed, >>> >>>you did NOT comment on the main finding Hagra has published here in >>>http://www.talkchess.com/forums/1/message.html?369557! >>> >>>I translate a second time into English: >>> >>>a) machine FRITZ 8 on AMD 1400 gets a solution time of 1 sec and that means >>>highest points for position no. 1 (which you gave thankfully above) >>> >>>b) machine FRITZ 8 on AMD 2800 gets a solution time of 480 sec!! So that it gets >>>way worse points in position no. 1!! >>> >>>Here is my verbal explanation (all found by Hagra): >>> >>>a stronger [!] machine on better hardware (do you accept that or do you claim >>>that AMD 1400 is STRONGER than AMD 2800?) is able to make a deeper [!!] >>>calculation and therefore finds the variation with first Rad8 - NOT as a final >>>solution, Ed! But as a variation, before it THEN comes back to Re3. Now, the >>>point is that such a behaviour is by far a sign for weaker strength but for >>>_better_ strength. But alas, Ed, the so called "WM-Test" of Dr. Mikhail Gurevich >>>gives to the weaker machine more points than for the stronger machine. >> >>This is common problem with positions which nature is positional. Take the start >>position for example, the moves 1.e4, 1.d4, 1.c4 and 1.Nf3 will produce scores >>that are very close to each other, in this case 4 moves with almost identical >>scores. What you see is that chess engines tend to switch from 1.e4 to 1.d4 >>frequently and that the speed of the PC actually introduces a random element as >>in this case with Fritz. > > >This is still not what is happening in the case of Fritz on different hardware >and WM-Test position number 1. > >a) at first Fritz 8 on AMD 2800 goes directly for Re3 > >b) then at a depth of 12 it changes to Rad8 > >c) later it comes back to Re3 (therefore it gets 480 sec as result) > >However the same engine on weaker hardware AMD 1400 does never change to Rad8 >because it can't go deep as 12. Therefore here Fritz 8 gets the result 1 sec >which is the best possible. For this apparent contradiction you must find a >reasonable solution in your test suite. At the end below you make a good >proposal. That's what I said, the (faster) hardware adds a random factor. >>The problem is unavoidable, the only good way to deal with random effects is to >>increase the number of positional positions to be tested, at least to 100. >It's not random, it's an effect of depth! Thus the faster hardware. To deal with randomness one flees to volume, like polls. >>The problem of randomness does not exist in clear tactical positions, there is >>only one good move, once the combination is found the key move will stay and the >>chess engine will never switch. But then what you are actually testing is SEARCH >>and not positional knowledge. >Thanks for the lesson. Ed, how do you think about the intention of M. Gurevich >who wants to test the "analytical ability" with his "WM-Test"? I don't think it is possible to create a testsuite that measures a true elo rating for a chess engine. If it were possible it would already exist, if someone manages to creates one he/she should sell it, I am ready to pay good money for it. My best, Ed >>>Question of Hagra and also myself: is this a reasonable test design if a >>>stronger machine gets less points just because it looks deeper into the >>>position? As you know the time for all machines per position is 20 minutes. And >>>Gurevich defines the "stable holding of the once chosen move" [my verbal >>>interpretation] as the best way to test the *analytical ability* of a machine. >>>Do you now understand the contradiction in the test design of Dr. Mikhail >>>Gurevich, dear Ed? Higher abilities get a worse result! Is that sound? Hopefully >>>NOT. >> >>There is only one good way to test a chess engine, play games and a lot of them. >>Testsuites are fun but surrogate but they are popular because it is a quick way >>to estimate the strength of an engine. > > >Goodness me!!! I always thought it but I never dared to write it. :) > > > > >> >> >>>Hope this clarifies the problem we faced with the German "WM-Test" in CSS. >> >>There are better ways, in the case of the WM-test I suggest the following >>change: >> >>1) Keep the current tactical positions, thinking time much lower, say 5-10 >>minutes. >> >>2) Take 100-200 positional positions to deal a bit with randomness, thinking >>time much lower, say time 1-2 minutes, different rating formula skipping the >>time element, only criteria is if a move is found or not. > >That's an interesting proposal and I wished that your collegues in Germany would >read you. I'm really happy that the debate ended with such a productive >solution. > >Thanks, Ed. > > >> >>My best, >> >>Ed
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.