Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: CSS WM TEST - a technical view

Author: martin fierz

Date: 07:42:34 06/15/04

Go up one level in this thread


On June 15, 2004 at 10:16:20, Rolf Tueschen wrote:

>On June 15, 2004 at 10:07:26, Geert van der Wulp wrote:
>
>>On June 15, 2004 at 09:39:20, Rolf Tueschen wrote:
>>
>>>On June 15, 2004 at 08:13:33, Geert van der Wulp wrote:
>>>
>>>>On June 15, 2004 at 07:58:42, Franz Hagra wrote:
>>>>
>>>>>>What is YOUR opinion about this? Should programs which solve 19 or 54, get the
>>>>>>same rating? The test formula calculates for these program's performances
>>>>>>
>>>>>>19 sol. --> 2.553
>>>>>>55 sol. --> 2.649
>>>>>>
>>>>>>But Hagra attaches 2.600 to both. I wonder who else accepts this as serious :))
>>>>>>Send in the clowns...
>>>>>>
>>>>>>Steve
>>>>>
>>>>>Hagra attaches the range of 2550-2649 for both - using 2600 an sf=2 simplyfing
>>>>>this as usual for measurement data in common.
>>>>>
>>>>>Hagra
>>>>
>>>>The fact that something is "common" to do does not mean that it is a good thing
>>>>to do. Why do you believe that the ratings are accurate for the relative
>>>>strengths of the programs up to 100 points? Why not 50, 25 or maybe 200?
>>>>
>>>>Geert
>>>
>>>
>>>The answer is easy. Hagra does not follow daydreaming and wishes but a clear
>>>mathematical urge. From that math formula above you cannot extract what you
>>>seemingly want to have. This is the easy answer to that question. Please ask
>>>further questions if you dont understand.
>>
>>If you read my question, then maybe it was not clear that it was meant as a
>>rhetorical question. My point is that obviously Hagra has NO clue what the
>>uncertainty in the quoted rating numbers is. But this he already confessed in
>>another post.
>
>He's saying that the WMTest formula does only allow to make statements with the
>first two digits.

that's what he said at first, but he admitted already that there are better ways
of doing things, namely quoting the exact result and adding that the margin of
error is 100 points. besides, i would say that claiming the uncertainty is 100
rating points is just as much a wild guess as anything in this debate :-)


>big fun to test. Our sole topic was if these tests could have any meaning for
>programmers and the answer is no.

way off. i am a chess programmer, and i use test sets with my engine. therefore,
such tests have some meaning for me. i certainly wouldn't use them to attach a
rating of XY to my engine, but they are not meaningless!

the only thing you're not allowed to do as a programmer is try to optimize for
such a test. now *that* will make test set results meaningless indeed...

cheers
  martin



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.