Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: WM Test Position 1 - Good Position or Proving Weakness of the Test!

Author: Robert Hyatt

Date: 12:31:11 06/12/04

Go up one level in this thread


On June 12, 2004 at 11:57:55, Rolf Tueschen wrote:

>On June 12, 2004 at 11:32:03, Robert Hyatt wrote:
>
>>This shows that such tests are basically flawed.  The test should state "The
>>time to solution is the time where the engine chooses the right move, and then
>>sticks with it from that point forward, searching at least 30 minutes more..."
>>
>>That stops this kind of nonsensical "faster = worse" problem.  Because as is,
>>the test simply is meaningless when changing nothing but the hardware results in
>>a poorer result...
>
>
>Would you be so friendly and taking a short fly over my own analysis for the
>position with FRITZ 8?
>
>It's at http://www.talkchess.com/forums/1/message.html?370049
>
>My questions:
>
>a) would you say that a test author can find a work around for such "changes" in
>the first choice for the actual pondering? In other words, did Gurevich miss a
>test technique you know of?

The answer is to run each position for a _long_ time.  Then look at the end of
the run and if the answer is correct, start backing up until the wrong answer is
found and use the time right after that where the right move is found as the
"found the solution time."

While that isn't perfect, it does prevent pure luck from getting the right move
at the right time for the wrong reason, where another 30 seconds would result in
a wrong answer...


>
>b) now the main topic: is it sound in the view of a test that an AMD 1400 finds
>the "solution" 1-Re3 in a hurry and therefore gets maximal points because it
>didn't change the first choice -- and my P4 2600 drifts with 1-Ne3 for a long
>time, but in the end - at a way higher depth as the AMD 1400 - comes back to the
>right solution, but in the end gets much less points as if it were weaker than
>the same program on AMD 1400??


It's simply a flaw in the test/testing.  IE if you were to do a totally random
evaluation, ignoring material and everything, it is likely that if you stop the
test at the right time, you will get the right answer...  And it would be
meaningless of course.





>
>c) do you see specific flaws in such a test construction with "positional"
>positions, as Ed explained?
>


Tactical positions are easy to test.  You need to get the right move, with the
right score or PV, and then not "lose" it with deeper searches.  Positional
tests are, by definition, more vague, and to say a program sees the "theme" and
gets the right move for the right reason, you have to let it search for a long
time to be sure it doesn't lose the right move.  If it does, you have to then
analyze carefully to be sure that the program is wrong, because some
"positional" test positions have serious flaws.






>d) evil question, typically Rolf: for you as the experienced computerchess
>knowie is it really a revelation to see a WM-Test with 100 positions from human
>Wchamps? What has that to do with anything in computerchess and the testing of
>computerchess programs?? Isn't it - thids is my assumption right from the
>beginning when I heard of that test, that the author wanted to increase his own
>status with all the Wchamps? ;)




Personally, the idea of using a test set to estimate ratings is just nonsense.
It's never worked for humans, it will never work for humans.  If you want a wild
guess of "is it 2200 or 2600?" then you might get something that comes close to
answering that.  But is A 2550 or 2600?  Forget it...







This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.