Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: CSS WM TEST - a technical view

Author: Geert van der Wulp

Date: 07:32:32 06/15/04

Go up one level in this thread


On June 15, 2004 at 10:27:28, Rolf Tueschen wrote:

>On June 15, 2004 at 10:14:46, Geert van der Wulp wrote:
>
>>On June 15, 2004 at 10:04:21, Rolf Tueschen wrote:
>>
>>>On June 15, 2004 at 07:50:51, Steve Glanzfeld wrote:
>>>
>>>>On June 15, 2004 at 07:34:16, Rolf Tueschen wrote:
>>>>
>>>>>On June 15, 2004 at 06:48:49, Steve Glanzfeld wrote:
>>>>>
>>>>>>On June 15, 2004 at 05:55:21, Franz Hagra wrote:
>>>>
>>>>[...]
>>>>
>>>>>>>1. 2700 former ranked 1-94 engines (here you find nearly all newer engines)
>>>>>>>2. 2600 former ranked 95-229 engines (amateur and older pro's)
>>>>>>>3. 2500 Queen 2.28 (UCI)
>>>>>>
>>>>>>?! This is clearly bogus. I have studied that data. In the first section you
>>>>>>mention, ranks 1-94, the programs have solved from
>>>>>>
>>>>>>54/100 to 73/100 positions!
>>>>>>
>>>>>>You give the SAME rating to programs which solve 54, 60, 65, 70 pos.?
>>>>>>
>>>>>>The first value I always look at is, how many solutions a program has achieved.
>>>>>>If one has 70 and the other has 60, my very simple conclusion is that the first
>>>>>>one has performed better :)
>>>>>>
>>>>>>You give both 2700? Are you joking? :)
>>>>>
>>>>>
>>>>>Steve, you play a game here with Hagra. Please don't copy and past only parts of
>>>>>what Hahra wrote only to ridicule him. You argue here, with many smileys of
>>>>>course, as if he, Hagra, were the author of the test and its formula.
>>>>
>>>>No. Are you dreaming?
>>>>
>>>>But this
>>>>>is a real bogus because Hagra does only discuss .....
>>>>
>>>>Yadda yadda yadda... :) It seems to me you get even less than me from all this.
>>>>
>>>>It' so simple: That formula gives of course DIFFERENT RATINGS to programs which
>>>>have achieved a DIFFERENT NUMBER OF SOLUTIONS. Hagra recommends that programs
>>>>which solved from 54 to 73, or even from 19 to 54, should have the same test
>>>>ratings! Crazy!
>>>>
>>>>What is YOUR opinion about this? Should programs which solve 19 or 54, get the
>>>>same rating? The test formula calculates for these program's performances
>>>>
>>>>19 sol. --> 2.553
>>>>55 sol. --> 2.649
>>>>
>>>>But Hagra attaches 2.600 to both. I wonder who else accepts this as serious :))
>>>>Send in the clowns...
>>>>
>>>>Steve
>>>
>>>Steve,
>>>
>>>it's also very hot actually in Germany but still I'm not entering party mode.
>>>The answer to all what you wrote is easy. Hagra wrote only what the significance
>>>would allow to say. Now I dont know what you know about such things. Did you
>>>think that Hagra or any other sound statistician is a fool if he respects such
>>>laws? From the given , but snipped by you, formula Hagra concluded what could be
>>>maximally said in the results. Believe it or not. It's simply as that. He did
>>>also explain, obviously without getting big applause, that calculation could
>>>never substantiate significance. I know that this is almost too difficult to
>>>understand for lays. But this is the reality. Either you go into the details or
>>>you wont be able to follow the debate. There is no third way out. There is no
>>>creativity to substantiate better results...
>>>
>>>Hope this helps... :)
>>
>>No, no, it does not. Hagra never explained WHY he believes that the error margin
>>would be 100 points for all engines. And your posts do not make it clear either.
>>Hagra has already written that he does NOT know what the uncertainty is in the
>>rating numbers, but YOU seem to know still.
>>
>>Waiting for an explanation, and a very good football match to you this eve,
>>
>>Geert
>
>
>You are reading too fast. Hagra said it in his first message. Ok, I also could
>read another explanation weeks ago in CSS. The whole thing is coming from the
>formula. You simply can't get significance out of calculations. Hagra did just
>give the "correct" numbers for the examples above to make clear how funny this
>looked. In CSS he gave another explanation.
>
>I dont know your education. But if you once studies physics you start with
>measuring a certain length. You have a tool to measure. Ok, now this tool allows
>you to measure, say two digits - example 4,78 - not more digits, say 4,785532.
>So, following Hagra it would be idiotic to give the latter as a result. This is
>not yet the end, Geert.
>
>Now we continue and make some calculations. we multiply and divide, ok?
>
>Then we get results.
>
>What do you expect from these results?
>
>Suddenly you get 2722 as Elo value. Is this a correct result? Or is it just
>possible, see above, to give 2700 as result?
>
>Now this is exactly what Hagra is saying. He says the two digits at the right
>are always 00. Because you cant "measure" correctly 27 22. These 22 are just
>results from calculations.
>
>Tonight?          Germany vs Holland Draw. :))

Hi Rolf,

I studied Econometrics, so my statistical understanding is alright. Can you
explain WHY our (well, ...) "tool" in this case is believed to be only accurate
up to two digits? Because the number of positions that we test is low, or
because the time that we have is limited, ... ?

I agree on a draw on forehand. It will make the tension for the next matches
only larger.

Geert



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.