Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: ONE Position out of 100 can't prove anything.

Author: Mike S.

Date: 14:23:52 06/12/04

Go up one level in this thread


On June 12, 2004 at 11:32:03, Robert Hyatt wrote:

>(...)

>This shows that such tests are basically flawed.  The test should state "The
>time to solution is the time where the engine chooses the right move, and then
>sticks with it from that point forward, searching at least 30 minutes more..."

Why "should..."?? This *is* the condition for a correct solution in the WM Test
and ever has been, with the exception that the max. time is 20 minutes/pos. A
solution is counted from the time when an engine has found *and kept* the
solution move until the full testing time of 20 minutes.

Rolf fails to inform you about that, or he doesn't know it himself. Does that
surprise you?

(You can always claim that the test time is too short, but if you for example
run every position for a whole day, you'll still find engines which would switch
to a wrong move after 26 hours. So you have to draw a line somewhere - and 20
minutes/pos. is a time for "intensive analysis;" a normal game usually will
nearly never take more than 10 minutes per pos. and not more than 3 minutes/pos.
average...)

http://www.computerschach.de/test/WM-Test.zip
(English version included, and results of 4 Crafties.)

I hope you didn't assume the WM-Test authors and the complete audience who uses
it, are idiots who count a "pseudo solution" which is found i.e. after 12
seconds, when from 42 secs. to 7 min. an engine switches to a wrong move
etc.etc. ?? Of course not. A high percentage of CSS readers are experienced
advanced computerchess users (at least). CSS itself has built, informed and
developed that expert's audience (I guess the US has nothing comparable,
unfortunately). - Also, advice has been given to set the "extra plies" parameter
for automatic testsuite functions to 99, to ensure that the complete testing
time is used, for each position. But in general, we have recommended to test
manually and watch the engine's thinking process to get impressions so to speak.

I'm a bit disappointed about your statement that "...such tests are basically
flawed.  The test should," when indeed it *does* just that.

>That stops this kind of nonsensical "faster = worse" problem.  Because as is,
>the test simply is meaningless when changing nothing but the hardware results in
>a poorer result...

Are you aware that only some (few) of the positions are affected by that
problem? The WM-Test has 100 positions. Some engines show that behaviour in some
of the positions (different engines in different positions). Some fail to
finally solve due to that, some solve but would change to a wrong move after
20:00, etc.

Can you guarantee that any single test position you use (and pls don't tell me
you use nove :-)) is not affected from that problem? Who can guarantee that?
Engines are creative in finding ways to decide for the correct move, but for the
wrong reason, sometimes... You are aware that it is very difficult to avoid it
to 100%, especially when a large test suite is compiled?

So please be fair.

Regards,
Mike Scheidl



This page took 0.01 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.