Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: CSS WM TEST - truth is NOT hypnosis

Author: Sandro Necchi

Date: 23:38:14 06/18/04

Go up one level in this thread


On June 18, 2004 at 14:31:12, Steve Glanzfeld wrote:

>On June 18, 2004 at 13:39:29, Rolf Tueschen wrote:
>
>>On June 18, 2004 at 12:59:43, Steve Glanzfeld wrote:
>>
>>>On June 18, 2004 at 09:47:55, Rolf Tueschen wrote:
>>>
>>>[...]
>>>:-))) I can imagine how a blackout must have suddenly hit you. Has someone
>>>turned the lights off while you were writing? Why in the world is it "HYPNOSIS"
>>>??? when people believe the truth to be true?
>>>
>>>Steve
>>
>>
>>As I told you - with your insults you can't expect to get answers. You showed
>>very well that you have a reading difficulty because above I didn't write that
>>_I_ believed that the ranking lists were "similar". That was a quote from
>>Gurevich. Understood?
>
>But in fact he's right, they ARE similar!! Understood? Compare any rankings you
>like...
>
>>To all the other problems I am certain that to your reading difficulty you have
>>even worse handicaps because you don't seem to be fit to get what is being
>>discussed here. This test can't bring effective news, this is the main point.
>
>Why is this "the main point" suddenly?? You find new "main points" every day.
>Don't you know that new engine versions are released every week? Testing them
>DOES bring news, because there is no other estimation of their strength, yet. A
>good test like the WM test can tell if it's a patzer or a potential top engine,
>or what's different from the previous version of that engine...
>
>>_All_ the programmers I could read say more or less frankly that they can't work
>>with _that_ test (100 positions). Because, surprise, to know a ranking place in
>>that test or in other position tests, has no importance for their programming.
>
>Surprise: Computerchess testsuites aren't intended only for the use by chess
>programmers. Acutally they are intended mainly to be used by fans, chess
>players, common program users, to be able to investigate the strength profile
>(strengths and weaknesses) of chess programs, find estimated rankings when they
>want to...

Hi Steve,

do not get me wrong; I am not against you at all.

I will try to let people understand why the programmers are not interested in
these test suite.

>
>Since some chess programmers have said that they aren't interested much in such
>tests, this seems to be your main argument against it. But this argument is not
>valid, because tests are made for thousands of users and fans (who do not use
>tests to develope, but to TEST), and not as a developing tool for programmers.

OK, you may have made the best test set and I think chess funs will find it
quite interesting to see if their latest chess program does perform well in this
test set.

This is very nice tool and we all must thank you for this, but it is not
reliable (unfortunately) to estimate a program strenght.

We have seen quite often; nearly all the time, that to modify a chess engine to
play better in those tests set a drawback. I mean that most of the time a
version of program X is better than another version of the same program
performing better in that tests set.
This means that in order to make a program stronger other things are more
important.

In reality this is explained if you consider the following:

1. To find the best move which allowes you to win in 30 moves instead of 60
moves does not bring you any Elo rating at all.
2. To be able to play some !! moves and many ? moves does make the program
weaker as with 2 ? moves one quite probably will lose the game while with some
!! it may not be able to win.

This means that the a chess program should be made overall stronger and not be
able to solve some specific positions.

So summarizing if one program is performing better in the test set could be
stronger, but not necessarely; most of the time it is not.

This is why the chess programmers do not rely on these test sets.

I am not saying that it is not possible to make a test set that can help to
reach what you are looking at, but probably this must be quite different and
with a huge no. of positions covering other issues as well.

>
>I'm sure it gives you BIG TROUBLE that the usually top-listed engines from
>gamebased rankings (Shredder, Fritz...) are also top in the WM test's results,
>while engines which are playing weak compared to these, are also ranking bad
>there :-))) It just works! Do you have sleepless nights now?
>
>Steve

Sandro



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.