Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Rating

Author: M Hurd

Date: 02:47:52 01/22/06

Go up one level in this thread


On January 20, 2006 at 07:55:25, Eelco de Groot wrote:

>On January 19, 2006 at 14:31:33, M Hurd wrote:
>
>>On January 19, 2006 at 09:39:00, Eelco de Groot wrote:
>>
>>>On January 19, 2006 at 09:05:03, M Hurd wrote:
>>>
>>>>On January 19, 2006 at 08:52:00, Ricardo Gibert wrote:
>>>>
>>>>>On January 19, 2006 at 08:36:03, M Hurd wrote:
>>>>>
>>>>>>On January 19, 2006 at 08:30:55, Ricardo Gibert wrote:
>>>>>>
>>>>>>>On January 19, 2006 at 08:11:54, M Hurd wrote:
>>>>>>>
>>>>>>>>If you play an engine match of 1000 games against 1 engine and play another
>>>>>>>>match of 1 game each against 1000 engines, would you get the same rating ?
>>>>>>>>
>>>>>>>>Is it more important to play as many different engines as possible or just
>>>>>>>>number of games played.
>>>>>>>
>>>>>>>Depends on what your are trying to measure. Relative strength to one particular
>>>>>>>engine or general strength against engines in general.
>>>>>>>
>>>>>>>>
>>>>>>>>Presumably there will be an optimum number for games and number of engines
>>>>>>>>played.
>>>>>>>
>>>>>>>Theoretically, the optimal number approaches infinity in both cases. Naturally,
>>>>>>>this has virtually no practical value. You will need to be more specific to get
>>>>>>>a more useable response.
>>>>>>>
>>>>>>>>
>>>>>>>>Regards
>>>>>>>>
>>>>>>>>Mike
>>>>>>
>>>>>>
>>>>>>Hi Ricardo
>>>>>>
>>>>>>I was simply wondering what would likely be the ELO difference between the 2
>>>>>>matches I outlined and which match would be the more accurate.
>>>>>
>>>>>Accurate in what sense? The 2 matches answer 2 different questions. What
>>>>>precisely are you trying to measure? My guess is you want to measure general
>>>>>playing strength rather than the relative strength between 2 particular engines.
>>>>>If that is the case, given those choices, this isn't a close call. One game
>>>>>against each of 1000 different engines is the way to go.
>>>>>
>>>>>Frankly, this ought to be obvious.
>>>>>
>>>>>>
>>>>>>Regards
>>>>>>
>>>>>>Mike
>>>>
>>>>
>>>>Frankly this is not obvious to me.
>>>>
>>>>If you play 1 game with 1 engine versus another you will get a result however
>>>>this could be a win loss or draw and tells you nothing. 1000 x nothing = nothing
>>>>where as 1000 games against 1 engine should give a more confident rating.
>>>>
>>>>Regards
>>>>
>>>>Mike
>>>
>>>Hello Mike,
>>>
>>>That makes no difference, any game tells you just as much no matter which
>>>opponent it is. For the rating (the TPR rating in this case) you simply compute
>>>the average result against the average rating of all the opponents.
>>>
>>>You get a better idea of the strength against all the different opponents if you
>>>play some (or just one) game against many of them, not just against one.
>>>That is because a rating is not a perfect predictor, some players will just have
>>>bad results against some of the possible opponents, their Angstgegners if you
>>>like. Also the average opponent-rating is a more dependable number than the
>>>rating of just one member of the group (there is less uncertainty involved
>>>because more game were played to compute the average)
>>>
>>>The situation is a bit more complex if the rating of your opponent (programs) is
>>>not very well known, or even unknown. Playing one or more games does not tell
>>>you anything about rating then, only about the difference in rating between the
>>>two. Therefore it becomes necessary to add to your tournament at least one but
>>>preferably more opponents with a known rating, and let each of the unrated
>>>players play against each other but also against the known ratings. Then you can
>>>calculate all of the ratings with a succesive approximation process.
>>>
>>>hope it makes some sense..
>>>
>>> Eelco
>>
>>
>>Thanks for the explanation.
>>
>>Hypertheticaly speaking Fritz plays Rybka 1000 times and a rating for fritz is
>>calulated based on the results of the games assuming Rybka's rating is known.
>>
>>Fritz then plays 1 game against 1000 engines with known ratings and a rating is
>>calculated. Which rating would be nearer to Fritz's likely rating or would they
>>be the same, hypertheticaly speaking.
>>
>>Regards
>>
>>Mike
>
>Hello Mike,
>
>The answer is still the same, if you want a rating that helps predict the
>outcome against many different opponents, your method number two is much better.
>Playing 1000 games against Rybka will tell you very well the strength of Fritz
>versus Rybka but that is not what you want to know. If Rybka plays exactly as
>well against Fritz as against all other engines, then your two answers will be
>almost the same, apart from the chance deviations. The "Rock beats Scissors
>beats Paper beats Rock"-effect may be little usually but especially in the
>computerchess world it happens that one program has good results aginst one but
>worse against another program which you would not expect upon rating alone.
>
>A secondary effect in practice is that although you know Rybka's rating with
>small +/- range(in practice maybe not the best example since Rybka is still in
>Beta stage) but say all 1000 engines including Rybka you could use have all
>played a thousand games in the SSDF elolist and you use programs in the list as
>opponents, their +/- ranges will change little anymore, but the average rating
>of the thousand engines in the SSDF elolist is still more "steady" than the
>rating of any single engine because there is almost no statistical deviation in
>that average. If the SSDF list consists of just these thousand engines there is
>zero statistical deviation in the average. But remember this number, average
>SSDF rating, is only relative, if you want it to reflect strength against humans
>for instance you have no other option than to play against humans too to
>calibrate the SSDF against their human ratings.
>
>I still have the feeling I could express and understand this better, but maybe
>someone else can do that! Thanks for the question!
>Eelco


Thanks for taking the time to answer.


Regards

Mike



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.