Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Comet A.96 - Wcrafty15.20 20 games blitz match

Author: Robert Hyatt

Date: 05:07:53 10/21/98

Go up one level in this thread


On October 21, 1998 at 04:37:38, Nouveau wrote:

>
>On October 20, 1998 at 12:13:16, Dann Corbit wrote:
>
>>On October 20, 1998 at 10:37:36, Nouveau wrote:
>>>On October 20, 1998 at 01:36:22, Jouni Uski wrote:
>>>
>>>>Here's result for 20 games match with 60/5 time limit (under Winboard):
>>>>
>>>>Comet    0.5 0 1 0 0 0 1 1 0 1 0 0 0.5 0 0.5 1 0.5 0 1 0   = 8
>>>>Wcrafty  0.5 1 0 1 1 1 0 0 1 0 1 1 0.5 1 0.5 0 0.5 1 0 1   = 12
>>>>
>>>>So they are very close to each other in playing strength.
>>>>
>>>>Jouni
>>>
>>>12-8 is very close ??????????
>>>
>>>When can we say : Crafty is better than Comet ? 18-2 ?
>>>
>>>I don't understand these statistical stuff : I can't imagine a 12-8 result in a
>>>match between 2 GM with a conclusion like "They are very close in playing
>>>stregth".
>>>
>>>Why do we need hundreds, maybe thousands of games between computers to evaluate
>>>relative strength, when few dozens are more than needed for human GMs ?
>>Any strong conclusion from a single match is faulty.  It could be that Comet is
>>500 points above Crafty, or 500 points below (although both of these are
>>statistically very unlikely, really, very little has been demonstrated at this
>>point from a single set of games).
>
>Just imagine : the match between Kasparov and Chirov takes place and the result
>is : Kasparov-Chirov = 12-8.
>Maybe Kasparov is 500 points above Chirov or 500 points below...Show me any
>chess magazine that would print such an affirmation.
>I know, those chess journalists don't have a clue on science and stats ;o)
>
>> The international chess bodies like FIDE
>>have definitely got it right in the way that they perform evaluations using the
>>ELO method.  Also, in requiring a long period of excellent results to become a
>>GM.
>
>Can someone make the math for this : a player has a 2600 level but no rating,
>how many games against a 2500 opposition does he need to reach 2600 ?
>


easy here.  one game.  his rating would be 2700 after that one game, since
the first N games uses the usual "TPR" type calculation.



>>  I think, in general, statistics is not a strong point of chess programmers.
>> Surely there are some who are experts, but I see a lot of very strange
>>statements.
>>
>>In any scientific community, an experiment [read "match"] must be repeatable
>>before any sort of conclusion can be reached. (Does anyone remember the name
>>'Pons'?)
>


"repeatability" is not really a requirement imposed by statistics... that
is what the "normal" curve (and central limit theorem is all about... the
fact that repeated tests can and will produce different results.)



>That's true if we consider that chess is science...has the "community" a strong
>agreement on this ?
>
>Jeff



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.