Computer Chess Club Archives

Search

Terms
Messages

Subject: Re: I just don't get this ...

Author: Bob Durrett
Date: 06:31:42 01/04/04
On January 04, 2004 at 07:09:12, Bo Persson wrote:

>On January 04, 2004 at 05:57:37, George Tsavdaris wrote:
>
>>On January 04, 2004 at 00:42:02, Christophe Theron wrote:
>>
>>>On January 03, 2004 at 20:53:39, Rick Rice wrote:
>>>
>>>>Person A posts a message saying Ruffian 2.0 is very dissapointing, with the
>>>>results to back it up. This is followed by a second post which basically says
>>>>that Ruffian 2.0 rocks with some results to back it up. Are these programs
>>>>really so time and hardware sensitive, so as to show varying results on
>>>>different CPUs/time controls?
>>>>
>>>>Ideal solution would be for SSDF to have one massive board with one CPU and
>>>>memory for each program (equal CPU and mem for all the progs on its list) and
>>>>some way to automate the play of these programs against each other..... on
>>>>different time controls such as regular, blitz etc. Just wishful thinking for
>>>>the future, but it would eliminate the multiple and varying results.
>>>>
>>>>Cheers,
>>>>Rick
>>>
>>>
>>>
>>>Statistics are extremely important in chess, and in computer chess.
>>>
>>>Unfortunately, even after years of talks about the subject, almost nobody on
>>>this message forum understands that you really need A LOT OF GAMES to start to
>>>have an impression of a probability about which program is stronger.
>>>
>>>The variations you have noticed do not come from different setups.
>>>
>>>These variations are statistical variations. That means that most of the match
>>>results posted here are statistically MEANINGLESS.
>>
>>It would be better, if you first define when something is statistically
>>meaningless.
>>
>>>
>>>People love to proudly post the result of the 20 games match they have run
>>>overnight. They don't even care to know if that result has any meaning. Well in
>>>most of the cases the result means nothing (just a waste of electric power) and
>>>you should not care about it at all.
>>
>> Always the result mean something. If someone play a match with parameters AA
>>between engine X and Y, Z number of games, then we are able to conclude some
>>things.
>> For example that X is stronger than Y with a probability k % (0<k<100)
>>when these two play with AA parameters.
>>
>> You say "most of the cases the result means nothing", so with that, you believe
>>that there are some cases(parameters AA,games Z) that the result means
>>something.
>
>I think Christophe means that if k% is not big enough, we don't really know
>meaning of the result.
>
>
>> And that for all other parameters AA, games Z the results are meaninless.
>>Why? Who can define the right parameters AA, number of games Z? Perhaps the god?
>
>No, but a statistician can tell you how many samples are needed to reach a
>conclusion with a specific certainty.
>
>The samples required are MUCH more than a quick test will give you, especially
>if you test engines that are really close. When you get a result of say 16-14
>with an error interval of 10, you really can't say anything for sure.
>
>One engine is better than the other, unless they are equal.  :-)
>
>
>Bo Persson

Perhaps it would be better to discuss the amount of information that a
tournament provides.  A small tournament DOES provide some information.  Whether
or not that information is interpreted and used properly is another matter.

Notice that 100 20-round tournaments might provide a lot of informantion when
the results are combined.  If the total information is non-zero then at least
one of the tournaments must have provided some information [there is no negative
information].

Bob D.
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.