Author: Rolf Tueschen
Date: 13:14:50 06/01/02
Go up one level in this thread
On June 01, 2002 at 13:14:58, Andrew Dados wrote: >On June 01, 2002 at 01:32:55, Uri Blass wrote: > >>On May 31, 2002 at 21:00:44, Rolf Tueschen wrote: >> >>>On May 31, 2002 at 20:35:38, Dann Corbit wrote: >>> >>>>On May 31, 2002 at 20:24:35, Rolf Tueschen wrote: >>>> >>>>>On May 31, 2002 at 20:02:37, Dann Corbit wrote: >>>>> >>>>>>On May 31, 2002 at 19:22:27, Rolf Tueschen wrote: >>>>>> >>>>>>>On May 31, 2002 at 19:01:53, Dann Corbit wrote: >>>>>>> >>>>>>>>Since people are so often confused about it, it seems a good idea to write a >>>>>>>>FAQ. >>>>>>>>Rolf's questions could be added, and a search through the CCC archives could >>>>>>>>find some more. >>>>>>>> >>>>>>>>Certainly the games against the old opponents is always a puzzle to newcomers >>>>>>>>who do not understand why calibration against an opponent of precisely known >>>>>>>>strength is of great value. >>>>>>> >>>>>>> >>>>>>>No pun intended, but excuse me, you can't mean it this way! Are we caught in a >>>>>>>new circle? How can the older program be precisely known in its strength? >>>>>>>Of course it it isn't! Because it had the same status the new ones have today... >>>>>>> >>>>>>>And the all the answers from Bertil follow that same fallacious line. It's a >>>>>>>pity! >>>>>>> >>>>>>>Also, what is calibration in SSDF? Comparing the new unknown with the old >>>>>>>unknown? No pun inded. >>>>>>> >>>>>>>Before making such a FAQ let's please find some practical solutions for SSDF. >>>>>> >>>>>>The older programs have been carefully calibrated by playing many hundreds of >>>>>>games. Hence, their strength in relation to each other and to the other members >>>>>>of the pool is very precisely known. >>>>>> >>>>>>The best possible test you can make is to play an unknown program against the >>>>>>best known programs. This will arrive at an accurate ELO score faster than any >>>>>>other way. Programs that are evenly matched are not as good as programs that >>>>>>are somewhat mismatched. Programs that are terribly mismatched are not as good >>>>>>as programs that are somewhat mismatched. >>>>>> >>>>>>If I have two programs of exactly equal ability, it will take a huge number of >>>>>>games to get a good reading on their strength in relation to one another. On >>>>>>the other hand, if one program is 1000 ELO better than another, then one or two >>>>>>fluke wins will drastically skew the score. An ELO difference of 100 to 150 is >>>>>>probably just about ideal. >>>>> >>>>>I don't follow that at all. Perhaps it's too difficult, but I fear that you are >>>>>mixing things up. You're arguing as if you _knew_ already that the one program >>>>>is 1000 points better. Therefore 2 games are ok for you. But how could you know >>>>>this in SSDF? And also, why do you test at all, if it's that simple? >>>> >>>>No. You have a group of programs of very well known strength. The ones that >>>>have played the most games are the ones where the strength is precisely known. >>> >>>I can't accept that. >>> >>>> >>>>Here is a little table: >>>> >>>>Win expectency for a difference of 0 points is 0.5 >>>>Win expectency for a difference of 100 points is 0.359935 >>>>Win expectency for a difference of 200 points is 0.240253 >>>>Win expectency for a difference of 300 points is 0.15098 >>>>Win expectency for a difference of 400 points is 0.0909091 >>>>Win expectency for a difference of 500 points is 0.0532402 >>>>Win expectency for a difference of 600 points is 0.0306534 >>>>Win expectency for a difference of 700 points is 0.0174721 >>>>Win expectency for a difference of 800 points is 0.00990099 >>>>Win expectency for a difference of 900 points is 0.00559197 >>>>Win expectency for a difference of 1000 points is 0.00315231 >>>> >>>>Notice that for 1000 ELO difference the win expectency is only .3%. >>> >>>I see. So, that is the Elo calculation of Elo for human chess, right? What is >>>giving you the confidence that it works for computers the same way? >> >>What gives you the confidence that it works for humans. >> >>These numbers were not calculated based on statistics of humans games and I >>believe that they are not correct also for humans. >> >>Uri > >Hello Uri. > >I keep noticing there is huge misconception about what ELO numbers are. >So I will try to explain how rating system is defined/build. > >Rating system is based on ONE, single assumption: that distribution of ratings >over big pool of players obeys normal distribution. > >Then we need to build a scale. >That means we need to define '0' point on the scale and also unit of measuring >(what '1 point' means). > >Lets say we define '0' equals 1740 ELO points. Meaning of this number is: >average rating of all players in pool is 1740 in our scale. it is chosen >arbitrarily and can be _any_ number. > >Then we define a unit, say 200 points in such a way, then 200 pts difference >translates to probability of winning equal to 0.75. This is another arbitrary >number, defining our scale. Discussing validity of it is about as sensible as >discussing if 1 meter on earth equals 1 meter on moon. > >So by definition all those numbers from Danns post are valid, that is basis to >calculate players ratings. > >-Andrew- For human chess, Andrew! Rolf Tueschen
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.