Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Comments of latest SSDF list - Nine basic questions

Author: Uri Blass

Date: 13:04:10 06/01/02

Go up one level in this thread


On June 01, 2002 at 13:14:58, Andrew Dados wrote:

>On June 01, 2002 at 01:32:55, Uri Blass wrote:
>
>>On May 31, 2002 at 21:00:44, Rolf Tueschen wrote:
>>
>>>On May 31, 2002 at 20:35:38, Dann Corbit wrote:
>>>
>>>>On May 31, 2002 at 20:24:35, Rolf Tueschen wrote:
>>>>
>>>>>On May 31, 2002 at 20:02:37, Dann Corbit wrote:
>>>>>
>>>>>>On May 31, 2002 at 19:22:27, Rolf Tueschen wrote:
>>>>>>
>>>>>>>On May 31, 2002 at 19:01:53, Dann Corbit wrote:
>>>>>>>
>>>>>>>>Since people are so often confused about it, it seems a good idea to write a
>>>>>>>>FAQ.
>>>>>>>>Rolf's questions could be added, and a search through the CCC archives could
>>>>>>>>find some more.
>>>>>>>>
>>>>>>>>Certainly the games against the old opponents is always a puzzle to newcomers
>>>>>>>>who do not understand why calibration against an opponent of precisely known
>>>>>>>>strength is of great value.
>>>>>>>
>>>>>>>
>>>>>>>No pun intended, but excuse me, you can't mean it this way! Are we caught in a
>>>>>>>new circle? How can the older program be precisely known in its strength?
>>>>>>>Of course it it isn't! Because it had the same status the new ones have today...
>>>>>>>
>>>>>>>And the all the answers from Bertil follow that same fallacious line. It's a
>>>>>>>pity!
>>>>>>>
>>>>>>>Also, what is calibration in SSDF? Comparing the new unknown with the old
>>>>>>>unknown? No pun inded.
>>>>>>>
>>>>>>>Before making such a FAQ let's please find some practical solutions for SSDF.
>>>>>>
>>>>>>The older programs have been carefully calibrated by playing many hundreds of
>>>>>>games.  Hence, their strength in relation to each other and to the other members
>>>>>>of the pool is very precisely known.
>>>>>>
>>>>>>The best possible test you can make is to play an unknown program against the
>>>>>>best known programs.  This will arrive at an accurate ELO score faster than any
>>>>>>other way.  Programs that are evenly matched are not as good as programs that
>>>>>>are somewhat mismatched.  Programs that are terribly mismatched are not as good
>>>>>>as programs that are somewhat mismatched.
>>>>>>
>>>>>>If I have two programs of exactly equal ability, it will take a huge number of
>>>>>>games to get a good reading on their strength in relation to one another.  On
>>>>>>the other hand, if one program is 1000 ELO better than another, then one or two
>>>>>>fluke wins will drastically skew the score.  An ELO difference of 100 to 150 is
>>>>>>probably just about ideal.
>>>>>
>>>>>I don't follow that at all. Perhaps it's too difficult, but I fear that you are
>>>>>mixing things up. You're arguing as if you _knew_ already that the one program
>>>>>is 1000 points better. Therefore 2 games are ok for you. But how could you know
>>>>>this in SSDF? And also, why do you test at all, if it's that simple?
>>>>
>>>>No.  You have a group of programs of very well known strength.  The ones that
>>>>have played the most games are the ones where the strength is precisely known.
>>>
>>>I can't accept that.
>>>
>>>>
>>>>Here is a little table:
>>>>
>>>>Win expectency for a difference of 0 points is 0.5
>>>>Win expectency for a difference of 100 points is 0.359935
>>>>Win expectency for a difference of 200 points is 0.240253
>>>>Win expectency for a difference of 300 points is 0.15098
>>>>Win expectency for a difference of 400 points is 0.0909091
>>>>Win expectency for a difference of 500 points is 0.0532402
>>>>Win expectency for a difference of 600 points is 0.0306534
>>>>Win expectency for a difference of 700 points is 0.0174721
>>>>Win expectency for a difference of 800 points is 0.00990099
>>>>Win expectency for a difference of 900 points is 0.00559197
>>>>Win expectency for a difference of 1000 points is 0.00315231
>>>>
>>>>Notice that for 1000 ELO difference the win expectency is only .3%.
>>>
>>>I see. So, that is the Elo calculation of Elo for human chess, right? What is
>>>giving you the confidence that it works for computers the same way?
>>
>>What gives you the confidence that it works for humans.
>>
>>These numbers were not calculated based on statistics of humans games and I
>>believe that they are not correct also for humans.
>>
>>Uri
>
>Hello Uri.
>
>I keep noticing there is huge misconception about what ELO numbers are.
>So I will try to explain how rating system is defined/build.
>
>Rating system is based on ONE, single assumption: that distribution of ratings
>over big pool of players obeys normal distribution.
>
>Then we need to build a scale.
>That means we need to define '0' point on the scale and also unit of measuring
>(what '1 point' means).
>
>Lets say we define '0' equals 1740 ELO points. Meaning of this number is:
>average rating of all players in pool is 1740 in our scale. it is chosen
>arbitrarily and can be _any_ number.
>
>Then we define a unit, say 200 points in such a way, then 200 pts difference
>translates to probability of winning equal to 0.75. This is another arbitrary
>number, defining our scale. Discussing validity of it is about as sensible as
>discussing if 1 meter on earth equals 1 meter on moon.

Some comments:

1)I know how the elo was build but my point is the following

The following is claimed:
Win expectency for a difference of 200 points is 0.240253
Win expectency for a difference of 400 points is 0.0909091

It is possible that these assumptions are not consistent

Suppose that A get average of 0.240253 against B when
B gets aerage of 0.240253 against C.

The elo assumption say that I can expect A to get average of 0.0909091 against
C.

It is not based on data of human-human games and I know of no investigation that
tried to predict the expected result between A and C based on the expected
result between A and B and the expected result between B and C.

These numbers are not based on data of human-human games but based on the normal
distribution and I have no reason to believe that this assumption is correct.

2)If the target is to find the best system for rating then it is better to do a
contest between humans in defining the rating of humans and it is better to give
a big prize for the winner of that tournament in order to encourage humans to
investigate it.


The winner of that contest should be the programmer who can write a program to
calculate rating for players that give the best prediction for games that were
not played.

Every participant in the contest of defining the rating system may send a
program that simply gets the data of all the games that were played in the last
years and calculate rating for every player based on the data.

The program should also define the expected result in games based only on the
difference between the rating of the players.
The program should calculate new rating every day.

The program that wins the tournament is the program that gives the best
predictions.

The best prediction is the prediction that gives the smallest error when the
error is the sum of the squares of the difference between the prediction and
what really happens.

Example:
If I predict expected result of 0.75 for player A against B and B beats A then
the (0.75*0.75) is added for the sum of errors and if the real result is a draw
then 0.25*0.25 is added for the sum of errors


3)Note that the problem in the rating system of today is not only the fact that
the normal distribution is probably wrong.

The rating system simply ignore important information.

Suppose that I lose against a player with a low rating and that player continue
to win in the next days again and again.

My rating is going to go down after my loss but I am not going to get back part
of the loss thanks to the new knowledge that I lost against a strong player.

I am sure that people could consider it in a  good rating system but I believe
that nobody is really interested in constructing a good rating system.

Uri



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.