Author: Rolf Tueschen
Date: 13:03:01 06/01/02
Go up one level in this thread
On June 01, 2002 at 01:32:55, Uri Blass wrote: >On May 31, 2002 at 21:00:44, Rolf Tueschen wrote: > >>On May 31, 2002 at 20:35:38, Dann Corbit wrote: >> >>>On May 31, 2002 at 20:24:35, Rolf Tueschen wrote: >>> >>>>On May 31, 2002 at 20:02:37, Dann Corbit wrote: >>>> >>>>>On May 31, 2002 at 19:22:27, Rolf Tueschen wrote: >>>>> >>>>>>On May 31, 2002 at 19:01:53, Dann Corbit wrote: >>>>>> >>>>>>>Since people are so often confused about it, it seems a good idea to write a >>>>>>>FAQ. >>>>>>>Rolf's questions could be added, and a search through the CCC archives could >>>>>>>find some more. >>>>>>> >>>>>>>Certainly the games against the old opponents is always a puzzle to newcomers >>>>>>>who do not understand why calibration against an opponent of precisely known >>>>>>>strength is of great value. >>>>>> >>>>>> >>>>>>No pun intended, but excuse me, you can't mean it this way! Are we caught in a >>>>>>new circle? How can the older program be precisely known in its strength? >>>>>>Of course it it isn't! Because it had the same status the new ones have today... >>>>>> >>>>>>And the all the answers from Bertil follow that same fallacious line. It's a >>>>>>pity! >>>>>> >>>>>>Also, what is calibration in SSDF? Comparing the new unknown with the old >>>>>>unknown? No pun inded. >>>>>> >>>>>>Before making such a FAQ let's please find some practical solutions for SSDF. >>>>> >>>>>The older programs have been carefully calibrated by playing many hundreds of >>>>>games. Hence, their strength in relation to each other and to the other members >>>>>of the pool is very precisely known. >>>>> >>>>>The best possible test you can make is to play an unknown program against the >>>>>best known programs. This will arrive at an accurate ELO score faster than any >>>>>other way. Programs that are evenly matched are not as good as programs that >>>>>are somewhat mismatched. Programs that are terribly mismatched are not as good >>>>>as programs that are somewhat mismatched. >>>>> >>>>>If I have two programs of exactly equal ability, it will take a huge number of >>>>>games to get a good reading on their strength in relation to one another. On >>>>>the other hand, if one program is 1000 ELO better than another, then one or two >>>>>fluke wins will drastically skew the score. An ELO difference of 100 to 150 is >>>>>probably just about ideal. >>>> >>>>I don't follow that at all. Perhaps it's too difficult, but I fear that you are >>>>mixing things up. You're arguing as if you _knew_ already that the one program >>>>is 1000 points better. Therefore 2 games are ok for you. But how could you know >>>>this in SSDF? And also, why do you test at all, if it's that simple? >>> >>>No. You have a group of programs of very well known strength. The ones that >>>have played the most games are the ones where the strength is precisely known. >> >>I can't accept that. >> >>> >>>Here is a little table: >>> >>>Win expectency for a difference of 0 points is 0.5 >>>Win expectency for a difference of 100 points is 0.359935 >>>Win expectency for a difference of 200 points is 0.240253 >>>Win expectency for a difference of 300 points is 0.15098 >>>Win expectency for a difference of 400 points is 0.0909091 >>>Win expectency for a difference of 500 points is 0.0532402 >>>Win expectency for a difference of 600 points is 0.0306534 >>>Win expectency for a difference of 700 points is 0.0174721 >>>Win expectency for a difference of 800 points is 0.00990099 >>>Win expectency for a difference of 900 points is 0.00559197 >>>Win expectency for a difference of 1000 points is 0.00315231 >>> >>>Notice that for 1000 ELO difference the win expectency is only .3%. >> >>I see. So, that is the Elo calculation of Elo for human chess, right? What is >>giving you the confidence that it works for computers the same way? > >What gives you the confidence that it works for humans. It's working by definition (which could be changed of course) and since then it's just updating. > >These numbers were not calculated based on statistics of humans games and I >believe that they are not correct also for humans. Of course it is! It's based on history of chess. Rolf Tueschen > >Uri
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.