Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Comments of latest SSDF list - Nine basic questions

Author: Rolf Tueschen

Date: 13:03:01 06/01/02

Go up one level in this thread


On June 01, 2002 at 01:32:55, Uri Blass wrote:

>On May 31, 2002 at 21:00:44, Rolf Tueschen wrote:
>
>>On May 31, 2002 at 20:35:38, Dann Corbit wrote:
>>
>>>On May 31, 2002 at 20:24:35, Rolf Tueschen wrote:
>>>
>>>>On May 31, 2002 at 20:02:37, Dann Corbit wrote:
>>>>
>>>>>On May 31, 2002 at 19:22:27, Rolf Tueschen wrote:
>>>>>
>>>>>>On May 31, 2002 at 19:01:53, Dann Corbit wrote:
>>>>>>
>>>>>>>Since people are so often confused about it, it seems a good idea to write a
>>>>>>>FAQ.
>>>>>>>Rolf's questions could be added, and a search through the CCC archives could
>>>>>>>find some more.
>>>>>>>
>>>>>>>Certainly the games against the old opponents is always a puzzle to newcomers
>>>>>>>who do not understand why calibration against an opponent of precisely known
>>>>>>>strength is of great value.
>>>>>>
>>>>>>
>>>>>>No pun intended, but excuse me, you can't mean it this way! Are we caught in a
>>>>>>new circle? How can the older program be precisely known in its strength?
>>>>>>Of course it it isn't! Because it had the same status the new ones have today...
>>>>>>
>>>>>>And the all the answers from Bertil follow that same fallacious line. It's a
>>>>>>pity!
>>>>>>
>>>>>>Also, what is calibration in SSDF? Comparing the new unknown with the old
>>>>>>unknown? No pun inded.
>>>>>>
>>>>>>Before making such a FAQ let's please find some practical solutions for SSDF.
>>>>>
>>>>>The older programs have been carefully calibrated by playing many hundreds of
>>>>>games.  Hence, their strength in relation to each other and to the other members
>>>>>of the pool is very precisely known.
>>>>>
>>>>>The best possible test you can make is to play an unknown program against the
>>>>>best known programs.  This will arrive at an accurate ELO score faster than any
>>>>>other way.  Programs that are evenly matched are not as good as programs that
>>>>>are somewhat mismatched.  Programs that are terribly mismatched are not as good
>>>>>as programs that are somewhat mismatched.
>>>>>
>>>>>If I have two programs of exactly equal ability, it will take a huge number of
>>>>>games to get a good reading on their strength in relation to one another.  On
>>>>>the other hand, if one program is 1000 ELO better than another, then one or two
>>>>>fluke wins will drastically skew the score.  An ELO difference of 100 to 150 is
>>>>>probably just about ideal.
>>>>
>>>>I don't follow that at all. Perhaps it's too difficult, but I fear that you are
>>>>mixing things up. You're arguing as if you _knew_ already that the one program
>>>>is 1000 points better. Therefore 2 games are ok for you. But how could you know
>>>>this in SSDF? And also, why do you test at all, if it's that simple?
>>>
>>>No.  You have a group of programs of very well known strength.  The ones that
>>>have played the most games are the ones where the strength is precisely known.
>>
>>I can't accept that.
>>
>>>
>>>Here is a little table:
>>>
>>>Win expectency for a difference of 0 points is 0.5
>>>Win expectency for a difference of 100 points is 0.359935
>>>Win expectency for a difference of 200 points is 0.240253
>>>Win expectency for a difference of 300 points is 0.15098
>>>Win expectency for a difference of 400 points is 0.0909091
>>>Win expectency for a difference of 500 points is 0.0532402
>>>Win expectency for a difference of 600 points is 0.0306534
>>>Win expectency for a difference of 700 points is 0.0174721
>>>Win expectency for a difference of 800 points is 0.00990099
>>>Win expectency for a difference of 900 points is 0.00559197
>>>Win expectency for a difference of 1000 points is 0.00315231
>>>
>>>Notice that for 1000 ELO difference the win expectency is only .3%.
>>
>>I see. So, that is the Elo calculation of Elo for human chess, right? What is
>>giving you the confidence that it works for computers the same way?
>
>What gives you the confidence that it works for humans.

It's working by definition (which could be changed of course) and since then
it's just updating.

>
>These numbers were not calculated based on statistics of humans games and I
>believe that they are not correct also for humans.

Of course it is! It's based on history of chess.

Rolf Tueschen

>
>Uri



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.