Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Are we ignoring basic math & statistics

Author: José Carlos

Date: 13:10:11 10/25/01

Go up one level in this thread


On October 25, 2001 at 15:20:10, Christophe Theron wrote:

>On October 25, 2001 at 13:28:19, José Carlos wrote:
>
>>On October 25, 2001 at 11:30:04, Christophe Theron wrote:
>>
>>>On October 25, 2001 at 07:39:46, José Carlos wrote:
>>>
>>>>On October 25, 2001 at 06:57:29, Mike Hood wrote:
>>>>
>>>>>On October 24, 2001 at 22:03:49, Stephen A. Boak wrote:
>>>>>
>>>>>>On October 24, 2001 at 11:43:18, Joshua Lee wrote:
>>>>>>
>>>>>>>For Starters If Deep Fritz were that Magical 2700+ number Like the SSDF Claims
>>>>>>>Then Huebner wouldn't have Drawn Every Game of their 6 game Match
>>>>>>>Secondly With All Do Respect No Commercial Program Has Played As Many Humans As
>>>>>>>The Deep Thought/Blue Programs and Also The Number of Games Vs. Rating Average
>>>>>>>Is Unequal (Not as many games as Deep Thought) If you Suggest that programs are
>>>>>>>So Strong why Then Hasn't One of the Top Commercial's Put up so much Money as to
>>>>>>>Play Against a Top 10 Opponent and Not a Couple of Unknowns?
>>>>>>>
>>>>>>>Tiger Didn't Beat All GM's and I don't think they were very Strong GM's someone
>>>>>>>even mentioned that Tiger was Lost in One Position. That may not say Much but I
>>>>>>>would Consider Rebel's Achievement or Deep Junior's Much More Impressive.
>>>>>>>Rebel because of So many Games against Strong and well Known GM's Like Rhode and
>>>>>>>Scherbakov  and Deep Junior for Beating GM Leko and Heubner , Drawing Everyone
>>>>>>>else Besides Kramnik and Lautier.
>>>>>>>
>>>>>>>8 Games are not really enough and 1 Tournament By no means makes a Computer a GM
>>>>>>>, They Can't Get The Title anyway, I would Like for this to be a possibility
>>>>>>>Then maybe someone would Try for their program to get it and we could Look to
>>>>>>>FIDE instead of SSDF .   I hate that the list should be lowered by upto 200
>>>>>>>points even by their own estimate the link is on their page.
>>>>>>>
>>>>>>>Another thing Tiger's Rating On an 866 Compared to the Speed Difference of the
>>>>>>>SSDF would Still Point to the SSDF's Given Rating for Tiger to be Wrong.
>>>>>>>
>>>>>>>Tiger is 2703 on a 1200
>>>>>>>While 2788 against an average 2497FIDE On a Slower 866 Hmm Somebody is wrong
>>>>>>>Either all those players were lying about their rating or Could it be that the
>>>>>>>SSDF Is Off ...
>>>>>>
>>>>>>Curiousity leads me to pose some questions to thoughtful posters:
>>>>>>
>>>>>>Ever hear of natural variation?  Do you think that a 2497 player plays at 2497
>>>>>>strength (whatever that means) on each move, and across each game, no matter the
>>>>>>day or time or opponent or how well he is feeling?
>>>>>>
>>>>>>Ever hear of the uncertainty of measurement?  What is the level of confidence
>>>>>>that a 2497 player is *actually* (whatever that means) a 2497 strength player?
>>>>>>
>>>>>>Can you accept random chance (natural variation) as a reason for occasional
>>>>>>exceptional results for programs or humans?
>>>>>>
>>>>>>Can you accept that measurements are all subject to some level of uncertainty,
>>>>>>some level of confidence less than certainty?
>>>>>>
>>>>>>If so, the above statement (prior poster) makes little sense.
>>>>>>
>>>>>>If not, I understand the dilemma and recommend a good introductory book on
>>>>>>statistics.
>>>>>>
>>>>>>Opinions are welcome, I have no problem with them.  But do posters investigate
>>>>>>and try to learn about the subject they comment on, or are they curious to
>>>>>>discover what they may be missing in their view of things?
>>>>>>
>>>>>>Math is not a solution to everything.  It is an often useful tool.  It both has
>>>>>>its uses and its limitations.  But to ignore it completely seems silly.  Do
>>>>>>posters know they ignore some basic uses of math (often statistics) when they
>>>>>>post?  Do they care?
>>>>>>
>>>>>>Just curious.
>>>>>>
>>>>>>--Steve
>>>>>
>>>>>Thanks, Steve. I often have thoughts like yours when I read posts with titles
>>>>>like "Beowulf is better than Deep Fritz on a 1.6 Ghz PC".
>>>>>
>>>>>What is the statistical background of the ELO rating system?
>>>>
>>>>  As I've asked some times: is there a good mathematical way to measure
>>>>'strength'? What is 'strength' actually? Can anyone give a precise definition of
>>>>'strength'? Without such a precise definition we can't draw any conclusion at
>>>>all about players' strength. And if we want to draw mathematical conclusions, we
>>>>need a mathematical definition.
>>>>  IMO, measuring ELO rating (which is defined by a mathematical formula) is very
>>>>different of measuring 'strength'.
>>>
>>>
>>>
>>>Strength is very clearly defined by the elo system. At least "relative strength
>>>inside a pool of given players".
>>>
>>>If you have a better definition (as you do not seem to be convinced by the Elo
>>>definition), feel free to submit yours...
>>
>>>    Christophe
>>
>>  I wish I had one, but I don't. As you said, Elo provides a good [enough]
>>definition of "relative strength inside a pool of given players". That shows
>>exactly the point I try to make: we can measure _realive_strength_ and
>>_inside_a_pool_. And even that is debatable, because we are saying
>>strength = results.
>>  Ok, it's fine way to do it.
>>  But it seems that some people talk about "absolute strength", and for that, we
>>don't have a definition, AFAIK.
>
>
>
>We could define absolute strength as the performance on a given set of
>positions, but that would be arbitrary anyway.
>
>The simplest definition is "the strongest wins more games". This defines
>strength in term of "offset" between two or N players.

  I suppose you mean "points" instead of "games".
  You're right. You provide a good definition, which is valid for me. But your
definition has two critic points:

  - An important proportion (I'd say most) of people don't use it. People try to
compare Capablanca to Karpov, SSDF with FIDE, etc.
  - Although it is good, it's not perfect. You know some players have good
results against other player for no aparent reason. Two players rated 2500, one
of them beats almost everytime the other. This happens very often. So, saying
"the one who wins more points" is not enough if the number of games doesn't
follow some rules. For example, "if every player in a given pool play each other
exactly 10 games, the strongest is the one who win more games". This is a more
complete definition, but very difficult to apply.

  The main "problem" (not everyone will consider this a problem, of course) is
that not everybody is willing to follow estrict rules to calculate strength. Not
everybody likes the scientific methods. And so we keep hearing comparisons that
don't fit any good definition.

>    Christophe

  José C.



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.