Author: Rolf Tueschen
Date: 14:44:36 06/02/02
Go up one level in this thread
On June 02, 2002 at 17:28:11, Rolf Tueschen wrote: >On June 02, 2002 at 13:05:38, Andrew Dados wrote: > >>On June 01, 2002 at 16:14:50, Rolf Tueschen wrote: >> >>>On June 01, 2002 at 13:14:58, Andrew Dados wrote: >>> >>>>On June 01, 2002 at 01:32:55, Uri Blass wrote: >>>> >>>>>On May 31, 2002 at 21:00:44, Rolf Tueschen wrote: >>>>> >>>>>>On May 31, 2002 at 20:35:38, Dann Corbit wrote: >>>>>> >>>>>>>On May 31, 2002 at 20:24:35, Rolf Tueschen wrote: >>>>>>> >>>>>>>>On May 31, 2002 at 20:02:37, Dann Corbit wrote: >>>>>>>> >>>>>>>>>On May 31, 2002 at 19:22:27, Rolf Tueschen wrote: >>>>>>>>> >>>>>>>>>>On May 31, 2002 at 19:01:53, Dann Corbit wrote: >>>>>>>>>> >>>>>>>>>>>Since people are so often confused about it, it seems a good idea to write a >>>>>>>>>>>FAQ. >>>>>>>>>>>Rolf's questions could be added, and a search through the CCC archives could >>>>>>>>>>>find some more. >>>>>>>>>>> >>>>>>>>>>>Certainly the games against the old opponents is always a puzzle to newcomers >>>>>>>>>>>who do not understand why calibration against an opponent of precisely known >>>>>>>>>>>strength is of great value. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>No pun intended, but excuse me, you can't mean it this way! Are we caught in a >>>>>>>>>>new circle? How can the older program be precisely known in its strength? >>>>>>>>>>Of course it it isn't! Because it had the same status the new ones have today... >>>>>>>>>> >>>>>>>>>>And the all the answers from Bertil follow that same fallacious line. It's a >>>>>>>>>>pity! >>>>>>>>>> >>>>>>>>>>Also, what is calibration in SSDF? Comparing the new unknown with the old >>>>>>>>>>unknown? No pun inded. >>>>>>>>>> >>>>>>>>>>Before making such a FAQ let's please find some practical solutions for SSDF. >>>>>>>>> >>>>>>>>>The older programs have been carefully calibrated by playing many hundreds of >>>>>>>>>games. Hence, their strength in relation to each other and to the other members >>>>>>>>>of the pool is very precisely known. >>>>>>>>> >>>>>>>>>The best possible test you can make is to play an unknown program against the >>>>>>>>>best known programs. This will arrive at an accurate ELO score faster than any >>>>>>>>>other way. Programs that are evenly matched are not as good as programs that >>>>>>>>>are somewhat mismatched. Programs that are terribly mismatched are not as good >>>>>>>>>as programs that are somewhat mismatched. >>>>>>>>> >>>>>>>>>If I have two programs of exactly equal ability, it will take a huge number of >>>>>>>>>games to get a good reading on their strength in relation to one another. On >>>>>>>>>the other hand, if one program is 1000 ELO better than another, then one or two >>>>>>>>>fluke wins will drastically skew the score. An ELO difference of 100 to 150 is >>>>>>>>>probably just about ideal. >>>>>>>> >>>>>>>>I don't follow that at all. Perhaps it's too difficult, but I fear that you are >>>>>>>>mixing things up. You're arguing as if you _knew_ already that the one program >>>>>>>>is 1000 points better. Therefore 2 games are ok for you. But how could you know >>>>>>>>this in SSDF? And also, why do you test at all, if it's that simple? >>>>>>> >>>>>>>No. You have a group of programs of very well known strength. The ones that >>>>>>>have played the most games are the ones where the strength is precisely known. >>>>>> >>>>>>I can't accept that. >>>>>> >>>>>>> >>>>>>>Here is a little table: >>>>>>> >>>>>>>Win expectency for a difference of 0 points is 0.5 >>>>>>>Win expectency for a difference of 100 points is 0.359935 >>>>>>>Win expectency for a difference of 200 points is 0.240253 >>>>>>>Win expectency for a difference of 300 points is 0.15098 >>>>>>>Win expectency for a difference of 400 points is 0.0909091 >>>>>>>Win expectency for a difference of 500 points is 0.0532402 >>>>>>>Win expectency for a difference of 600 points is 0.0306534 >>>>>>>Win expectency for a difference of 700 points is 0.0174721 >>>>>>>Win expectency for a difference of 800 points is 0.00990099 >>>>>>>Win expectency for a difference of 900 points is 0.00559197 >>>>>>>Win expectency for a difference of 1000 points is 0.00315231 >>>>>>> >>>>>>>Notice that for 1000 ELO difference the win expectency is only .3%. >>>>>> >>>>>>I see. So, that is the Elo calculation of Elo for human chess, right? What is >>>>>>giving you the confidence that it works for computers the same way? >>>>> >>>>>What gives you the confidence that it works for humans. >>>>> >>>>>These numbers were not calculated based on statistics of humans games and I >>>>>believe that they are not correct also for humans. >>>>> >>>>>Uri >>>> >>>>Hello Uri. >>>> >>>>I keep noticing there is huge misconception about what ELO numbers are. >>>>So I will try to explain how rating system is defined/build. >>>> >>>>Rating system is based on ONE, single assumption: that distribution of ratings >>>>over big pool of players obeys normal distribution. >>>> >>>>Then we need to build a scale. >>>>That means we need to define '0' point on the scale and also unit of measuring >>>>(what '1 point' means). >>>> >>>>Lets say we define '0' equals 1740 ELO points. Meaning of this number is: >>>>average rating of all players in pool is 1740 in our scale. it is chosen >>>>arbitrarily and can be _any_ number. >>>> >>>>Then we define a unit, say 200 points in such a way, then 200 pts difference >>>>translates to probability of winning equal to 0.75. This is another arbitrary >>>>number, defining our scale. Discussing validity of it is about as sensible as >>>>discussing if 1 meter on earth equals 1 meter on moon. >>>> >>>>So by definition all those numbers from Danns post are valid, that is basis to >>>>calculate players ratings. >>>> >>>>-Andrew- >>> >>>For human chess, Andrew! >>> >>>Rolf Tueschen >> >>?? for all rating lists scale is defined in SAME way. whether its human only >>pool or computer-only pool of players all the above is valid. >> >>However I think I know what you are saying ( or am I..:) >> >>Imagine you have big number of lions and zebras. >>When you measure average height of mixed loins and zebras then it will be >>different then average height of lions ONLY. >> >>But in each case statistics is still valid, as soon as you believe in normal >>distribution. Whether numbers correspond to each other, it is a different case. >> >>-Andrew- > >Two points. You must show normal distribution. Chess strength in humans, that >play chess, height - ok. > >But what is with strength in computers? > >Then the other point. The mixing. On the contrary, I would be happy if computers >would play in human tournaments. That would answer many of my questions. But >with comp vs comp you can't find results about strength because the comps in >each year's version are equally strong. See the actual SSDF list! :) >But on which level is totally open. We >simply don't know. We can't know it, at least with SSDF. >BTW for the mixing I would advise to change the traditional concept of >computerchess because the actual is unfair for human vs comp. > >Rolf Tueschen
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.