Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Comments of latest SSDF list - Nine basic questions

Author: Rolf Tueschen

Date: 14:44:36 06/02/02

Go up one level in this thread


On June 02, 2002 at 17:28:11, Rolf Tueschen wrote:

>On June 02, 2002 at 13:05:38, Andrew Dados wrote:
>
>>On June 01, 2002 at 16:14:50, Rolf Tueschen wrote:
>>
>>>On June 01, 2002 at 13:14:58, Andrew Dados wrote:
>>>
>>>>On June 01, 2002 at 01:32:55, Uri Blass wrote:
>>>>
>>>>>On May 31, 2002 at 21:00:44, Rolf Tueschen wrote:
>>>>>
>>>>>>On May 31, 2002 at 20:35:38, Dann Corbit wrote:
>>>>>>
>>>>>>>On May 31, 2002 at 20:24:35, Rolf Tueschen wrote:
>>>>>>>
>>>>>>>>On May 31, 2002 at 20:02:37, Dann Corbit wrote:
>>>>>>>>
>>>>>>>>>On May 31, 2002 at 19:22:27, Rolf Tueschen wrote:
>>>>>>>>>
>>>>>>>>>>On May 31, 2002 at 19:01:53, Dann Corbit wrote:
>>>>>>>>>>
>>>>>>>>>>>Since people are so often confused about it, it seems a good idea to write a
>>>>>>>>>>>FAQ.
>>>>>>>>>>>Rolf's questions could be added, and a search through the CCC archives could
>>>>>>>>>>>find some more.
>>>>>>>>>>>
>>>>>>>>>>>Certainly the games against the old opponents is always a puzzle to newcomers
>>>>>>>>>>>who do not understand why calibration against an opponent of precisely known
>>>>>>>>>>>strength is of great value.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>No pun intended, but excuse me, you can't mean it this way! Are we caught in a
>>>>>>>>>>new circle? How can the older program be precisely known in its strength?
>>>>>>>>>>Of course it it isn't! Because it had the same status the new ones have today...
>>>>>>>>>>
>>>>>>>>>>And the all the answers from Bertil follow that same fallacious line. It's a
>>>>>>>>>>pity!
>>>>>>>>>>
>>>>>>>>>>Also, what is calibration in SSDF? Comparing the new unknown with the old
>>>>>>>>>>unknown? No pun inded.
>>>>>>>>>>
>>>>>>>>>>Before making such a FAQ let's please find some practical solutions for SSDF.
>>>>>>>>>
>>>>>>>>>The older programs have been carefully calibrated by playing many hundreds of
>>>>>>>>>games.  Hence, their strength in relation to each other and to the other members
>>>>>>>>>of the pool is very precisely known.
>>>>>>>>>
>>>>>>>>>The best possible test you can make is to play an unknown program against the
>>>>>>>>>best known programs.  This will arrive at an accurate ELO score faster than any
>>>>>>>>>other way.  Programs that are evenly matched are not as good as programs that
>>>>>>>>>are somewhat mismatched.  Programs that are terribly mismatched are not as good
>>>>>>>>>as programs that are somewhat mismatched.
>>>>>>>>>
>>>>>>>>>If I have two programs of exactly equal ability, it will take a huge number of
>>>>>>>>>games to get a good reading on their strength in relation to one another.  On
>>>>>>>>>the other hand, if one program is 1000 ELO better than another, then one or two
>>>>>>>>>fluke wins will drastically skew the score.  An ELO difference of 100 to 150 is
>>>>>>>>>probably just about ideal.
>>>>>>>>
>>>>>>>>I don't follow that at all. Perhaps it's too difficult, but I fear that you are
>>>>>>>>mixing things up. You're arguing as if you _knew_ already that the one program
>>>>>>>>is 1000 points better. Therefore 2 games are ok for you. But how could you know
>>>>>>>>this in SSDF? And also, why do you test at all, if it's that simple?
>>>>>>>
>>>>>>>No.  You have a group of programs of very well known strength.  The ones that
>>>>>>>have played the most games are the ones where the strength is precisely known.
>>>>>>
>>>>>>I can't accept that.
>>>>>>
>>>>>>>
>>>>>>>Here is a little table:
>>>>>>>
>>>>>>>Win expectency for a difference of 0 points is 0.5
>>>>>>>Win expectency for a difference of 100 points is 0.359935
>>>>>>>Win expectency for a difference of 200 points is 0.240253
>>>>>>>Win expectency for a difference of 300 points is 0.15098
>>>>>>>Win expectency for a difference of 400 points is 0.0909091
>>>>>>>Win expectency for a difference of 500 points is 0.0532402
>>>>>>>Win expectency for a difference of 600 points is 0.0306534
>>>>>>>Win expectency for a difference of 700 points is 0.0174721
>>>>>>>Win expectency for a difference of 800 points is 0.00990099
>>>>>>>Win expectency for a difference of 900 points is 0.00559197
>>>>>>>Win expectency for a difference of 1000 points is 0.00315231
>>>>>>>
>>>>>>>Notice that for 1000 ELO difference the win expectency is only .3%.
>>>>>>
>>>>>>I see. So, that is the Elo calculation of Elo for human chess, right? What is
>>>>>>giving you the confidence that it works for computers the same way?
>>>>>
>>>>>What gives you the confidence that it works for humans.
>>>>>
>>>>>These numbers were not calculated based on statistics of humans games and I
>>>>>believe that they are not correct also for humans.
>>>>>
>>>>>Uri
>>>>
>>>>Hello Uri.
>>>>
>>>>I keep noticing there is huge misconception about what ELO numbers are.
>>>>So I will try to explain how rating system is defined/build.
>>>>
>>>>Rating system is based on ONE, single assumption: that distribution of ratings
>>>>over big pool of players obeys normal distribution.
>>>>
>>>>Then we need to build a scale.
>>>>That means we need to define '0' point on the scale and also unit of measuring
>>>>(what '1 point' means).
>>>>
>>>>Lets say we define '0' equals 1740 ELO points. Meaning of this number is:
>>>>average rating of all players in pool is 1740 in our scale. it is chosen
>>>>arbitrarily and can be _any_ number.
>>>>
>>>>Then we define a unit, say 200 points in such a way, then 200 pts difference
>>>>translates to probability of winning equal to 0.75. This is another arbitrary
>>>>number, defining our scale. Discussing validity of it is about as sensible as
>>>>discussing if 1 meter on earth equals 1 meter on moon.
>>>>
>>>>So by definition all those numbers from Danns post are valid, that is basis to
>>>>calculate players ratings.
>>>>
>>>>-Andrew-
>>>
>>>For human chess, Andrew!
>>>
>>>Rolf Tueschen
>>
>>?? for all rating lists scale is defined in SAME way. whether its human only
>>pool or computer-only pool of players all the above is valid.
>>
>>However I think I know what you are saying ( or am I..:)
>>
>>Imagine you have big number of lions and zebras.
>>When you measure average height of mixed loins and zebras then it will be
>>different then average height of lions ONLY.
>>
>>But in each case statistics is still valid, as soon as you believe in normal
>>distribution. Whether numbers correspond to each other, it is a different case.
>>
>>-Andrew-
>
>Two points. You must show normal distribution. Chess strength in humans, that
>play chess, height - ok.
>
>But what is with strength in computers?
>
>Then the other point. The mixing. On the contrary, I would be happy if computers
>would play in human tournaments. That would answer many of my questions. But
>with comp vs comp you can't find results about strength because the comps in
>each year's version are equally strong.

See the actual SSDF list! :)


>But on which level is totally open. We
>simply don't know. We can't know it, at least with SSDF.
>BTW for the mixing I would advise to change the traditional concept of
>computerchess because the actual is unfair for human vs comp.
>
>Rolf Tueschen



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.