Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: CSS WM TEST - a technical view

Author: Geert van der Wulp

Date: 00:29:05 06/18/04

Go up one level in this thread


On June 16, 2004 at 18:56:41, Sune Fischer wrote:

>On June 16, 2004 at 16:49:28, Steve Glanzfeld wrote:
>
>>On June 15, 2004 at 17:28:38, Vincent Diepeveen wrote:
>>
>>>On June 15, 2004 at 16:26:09, Steve Glanzfeld wrote:
>>>
>>>>No normal program will choose an unusual move (i.e. a queen sac) "out of the
>>>>blue" in a normal position. Except, the program is completely broken.
>>>>
>>>>You guys are argueing as if it would be DOWNRIGHT BAD when a chess program finds
>>>>good moves (quickly)... I wonder what a chess program looks like, when it is
>>>>based on that philosophy :)) Does it try to avoid the good moves? So, if there's
>>>>a lack of success, the chances are good that we have found a major reason here
>>>>:)
>>
>>>"I created a version that was tactical brilliant. It solved *everything* in the
>>>testsuites. Then i started playing with it and it was hundreds of points weaker
>>>in games." Stefan Meyer Kahlen a few months ago.
>>
>>No engine can solve everything in every testsuite. There are not only tactical
>>tests, for example (big surprise eh? :)))
>>
>>>
>>>So the answer to your question is: The version that scores hundreds of points
>>>more onto testsuites is NOT the version to play with at tournaments, because in
>>>testsuites all those patzermoves work as we know and they do not in tournaments.
>>
>>Again, don't you understand that those moves HAVE WORKED in games? :) These are
>>World Champion's winning moves! What are you talking about "do not work in
>>tournaments"...???
>>
>>Which program, in several versions, do you think ranks #2, #5 and #7 in the WM
>>test results? Shredder! :)) Note, that the version ranking #2 has the same
>>number of solutions as the leader. Ranks #1/3/6/8/9/10 are Fritz versions. Next
>>best are CM versions, Hiarcs 9, and Deep Juniors. At the bottom of the list we
>>find oldies and weaker freeware.
>>
>>So, we find the same engines in the top of that test's ranking list (from a
>>total of 230 results in the currently available download), which we do as well
>>find in many ranking lists based on games.
>>
>>I wonder why some people here have so much trouble understanding or accepting
>>this. Strange.
>>
>>Steve
>
>I think it has been explained to you already, but I'll give it another try.
>
>The problem is that the implication
>"higher testscores" => "stronger engine" is often false.
>
>There are severy reasons for that I think, some of them already mentioned.
>
>One of the biggest problems is that test positions are not really representative
>of a real game.
>It seems impossible to weigh in the different type of positions, ie. say you
>have 10 king sac, 10 endgame and 10 midgame with subtle moves.

Excuse me, a KING sac??

>
>Now you take two engines and get resp. 4, 8, 2 and 6, 5, 3 solutions.
>
>The right kind of king sac can of course decide the game, but these position >may

Yes, right, A KING sac ALWAYS decides the game. But are we testing the engines
for helpmates or something?

>occur rarely in games so it might not be hugely important for practical rating.

It is true. In my own games I have never played a King sac myself, and I have
never been astonished by one that my opponent played.

>Of course the engine must also be able to get that kind of positions on the
>board in the first place.

Yes, true. I think most will not allow such King sac.

>
>The subtle midgame moves occur extremely frequently of course, but a few 0.1
>moves won't be enough to win a game.
>
>Being excelent in the endgame won't help much if you can never survive the
>midgame. Etc..
>
>So even though the two engines may score the same, it says absolutely nothing
>about which will be the better player.
>
>Of course you can try and create a set of positions you think will be
>representative and make a guess as to some proper weighing.
>But that's all it's going to be, basicly guessing out of the blue.
>
>Even if one engine scores higher on all suites, there is still a chance it is
>worse if e.g. it is much too "trigger happy" and generally overestimates its
>chances in, say, passed pawn endgames.
>A good example of that is wac2 I think, high passed pawn values will usually
>help the engine find the right sac although the same "knowledge" can backfire in
>other positions.
>
>-S.

Geert



This page took 0.01 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.