Author: Rolf Tueschen
Date: 15:51:49 06/01/02
Go up one level in this thread
On June 01, 2002 at 16:34:53, Uri Blass wrote: >On June 01, 2002 at 16:03:01, Rolf Tueschen wrote: > >>On June 01, 2002 at 01:32:55, Uri Blass wrote: >> >>>On May 31, 2002 at 21:00:44, Rolf Tueschen wrote: >>> >>>>On May 31, 2002 at 20:35:38, Dann Corbit wrote: >>>> >>>>>On May 31, 2002 at 20:24:35, Rolf Tueschen wrote: >>>>> >>>>>>On May 31, 2002 at 20:02:37, Dann Corbit wrote: >>>>>> >>>>>>>On May 31, 2002 at 19:22:27, Rolf Tueschen wrote: >>>>>>> >>>>>>>>On May 31, 2002 at 19:01:53, Dann Corbit wrote: >>>>>>>> >>>>>>>>>Since people are so often confused about it, it seems a good idea to write a >>>>>>>>>FAQ. >>>>>>>>>Rolf's questions could be added, and a search through the CCC archives could >>>>>>>>>find some more. >>>>>>>>> >>>>>>>>>Certainly the games against the old opponents is always a puzzle to newcomers >>>>>>>>>who do not understand why calibration against an opponent of precisely known >>>>>>>>>strength is of great value. >>>>>>>> >>>>>>>> >>>>>>>>No pun intended, but excuse me, you can't mean it this way! Are we caught in a >>>>>>>>new circle? How can the older program be precisely known in its strength? >>>>>>>>Of course it it isn't! Because it had the same status the new ones have today... >>>>>>>> >>>>>>>>And the all the answers from Bertil follow that same fallacious line. It's a >>>>>>>>pity! >>>>>>>> >>>>>>>>Also, what is calibration in SSDF? Comparing the new unknown with the old >>>>>>>>unknown? No pun inded. >>>>>>>> >>>>>>>>Before making such a FAQ let's please find some practical solutions for SSDF. >>>>>>> >>>>>>>The older programs have been carefully calibrated by playing many hundreds of >>>>>>>games. Hence, their strength in relation to each other and to the other members >>>>>>>of the pool is very precisely known. >>>>>>> >>>>>>>The best possible test you can make is to play an unknown program against the >>>>>>>best known programs. This will arrive at an accurate ELO score faster than any >>>>>>>other way. Programs that are evenly matched are not as good as programs that >>>>>>>are somewhat mismatched. Programs that are terribly mismatched are not as good >>>>>>>as programs that are somewhat mismatched. >>>>>>> >>>>>>>If I have two programs of exactly equal ability, it will take a huge number of >>>>>>>games to get a good reading on their strength in relation to one another. On >>>>>>>the other hand, if one program is 1000 ELO better than another, then one or two >>>>>>>fluke wins will drastically skew the score. An ELO difference of 100 to 150 is >>>>>>>probably just about ideal. >>>>>> >>>>>>I don't follow that at all. Perhaps it's too difficult, but I fear that you are >>>>>>mixing things up. You're arguing as if you _knew_ already that the one program >>>>>>is 1000 points better. Therefore 2 games are ok for you. But how could you know >>>>>>this in SSDF? And also, why do you test at all, if it's that simple? >>>>> >>>>>No. You have a group of programs of very well known strength. The ones that >>>>>have played the most games are the ones where the strength is precisely known. >>>> >>>>I can't accept that. >>>> >>>>> >>>>>Here is a little table: >>>>> >>>>>Win expectency for a difference of 0 points is 0.5 >>>>>Win expectency for a difference of 100 points is 0.359935 >>>>>Win expectency for a difference of 200 points is 0.240253 >>>>>Win expectency for a difference of 300 points is 0.15098 >>>>>Win expectency for a difference of 400 points is 0.0909091 >>>>>Win expectency for a difference of 500 points is 0.0532402 >>>>>Win expectency for a difference of 600 points is 0.0306534 >>>>>Win expectency for a difference of 700 points is 0.0174721 >>>>>Win expectency for a difference of 800 points is 0.00990099 >>>>>Win expectency for a difference of 900 points is 0.00559197 >>>>>Win expectency for a difference of 1000 points is 0.00315231 >>>>> >>>>>Notice that for 1000 ELO difference the win expectency is only .3%. >>>> >>>>I see. So, that is the Elo calculation of Elo for human chess, right? What is >>>>giving you the confidence that it works for computers the same way? >>> >>>What gives you the confidence that it works for humans. >> >>It's working by definition (which could be changed of course) and since then >>it's just updating. >> >>> >>>These numbers were not calculated based on statistics of humans games and I >>>believe that they are not correct also for humans. >> >>Of course it is! It's based on history of chess. > >No it is based on the normal distribution. > >There was no real investigation of the history of chess in order to find the win >expectency based on the difference between rating. > >We can assume that the win expectency difference of 100 in the rating is >0.359935 but in that case it is possible that the number for 200 elo is not >consistent. > >It is also possible that there is no good constant number for the win expectency >when the difference in elo is 200 and that the expected number is different for >strong players and for weak players. > >Uri I'm sorry I misunderstood you. Of course the Elo formula could be made more perfect. Elo took the best players to watch their performances. Or did Elo start with the maths of his formula? Anyway, I think no formula would be absolutely perfect. Still I can't see the reason why SSDF right now should have Elo numbers with any meaning at all. Rolf Tueschen
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.