Author: M Hurd
Date: 02:47:52 01/22/06
Go up one level in this thread
On January 20, 2006 at 07:55:25, Eelco de Groot wrote: >On January 19, 2006 at 14:31:33, M Hurd wrote: > >>On January 19, 2006 at 09:39:00, Eelco de Groot wrote: >> >>>On January 19, 2006 at 09:05:03, M Hurd wrote: >>> >>>>On January 19, 2006 at 08:52:00, Ricardo Gibert wrote: >>>> >>>>>On January 19, 2006 at 08:36:03, M Hurd wrote: >>>>> >>>>>>On January 19, 2006 at 08:30:55, Ricardo Gibert wrote: >>>>>> >>>>>>>On January 19, 2006 at 08:11:54, M Hurd wrote: >>>>>>> >>>>>>>>If you play an engine match of 1000 games against 1 engine and play another >>>>>>>>match of 1 game each against 1000 engines, would you get the same rating ? >>>>>>>> >>>>>>>>Is it more important to play as many different engines as possible or just >>>>>>>>number of games played. >>>>>>> >>>>>>>Depends on what your are trying to measure. Relative strength to one particular >>>>>>>engine or general strength against engines in general. >>>>>>> >>>>>>>> >>>>>>>>Presumably there will be an optimum number for games and number of engines >>>>>>>>played. >>>>>>> >>>>>>>Theoretically, the optimal number approaches infinity in both cases. Naturally, >>>>>>>this has virtually no practical value. You will need to be more specific to get >>>>>>>a more useable response. >>>>>>> >>>>>>>> >>>>>>>>Regards >>>>>>>> >>>>>>>>Mike >>>>>> >>>>>> >>>>>>Hi Ricardo >>>>>> >>>>>>I was simply wondering what would likely be the ELO difference between the 2 >>>>>>matches I outlined and which match would be the more accurate. >>>>> >>>>>Accurate in what sense? The 2 matches answer 2 different questions. What >>>>>precisely are you trying to measure? My guess is you want to measure general >>>>>playing strength rather than the relative strength between 2 particular engines. >>>>>If that is the case, given those choices, this isn't a close call. One game >>>>>against each of 1000 different engines is the way to go. >>>>> >>>>>Frankly, this ought to be obvious. >>>>> >>>>>> >>>>>>Regards >>>>>> >>>>>>Mike >>>> >>>> >>>>Frankly this is not obvious to me. >>>> >>>>If you play 1 game with 1 engine versus another you will get a result however >>>>this could be a win loss or draw and tells you nothing. 1000 x nothing = nothing >>>>where as 1000 games against 1 engine should give a more confident rating. >>>> >>>>Regards >>>> >>>>Mike >>> >>>Hello Mike, >>> >>>That makes no difference, any game tells you just as much no matter which >>>opponent it is. For the rating (the TPR rating in this case) you simply compute >>>the average result against the average rating of all the opponents. >>> >>>You get a better idea of the strength against all the different opponents if you >>>play some (or just one) game against many of them, not just against one. >>>That is because a rating is not a perfect predictor, some players will just have >>>bad results against some of the possible opponents, their Angstgegners if you >>>like. Also the average opponent-rating is a more dependable number than the >>>rating of just one member of the group (there is less uncertainty involved >>>because more game were played to compute the average) >>> >>>The situation is a bit more complex if the rating of your opponent (programs) is >>>not very well known, or even unknown. Playing one or more games does not tell >>>you anything about rating then, only about the difference in rating between the >>>two. Therefore it becomes necessary to add to your tournament at least one but >>>preferably more opponents with a known rating, and let each of the unrated >>>players play against each other but also against the known ratings. Then you can >>>calculate all of the ratings with a succesive approximation process. >>> >>>hope it makes some sense.. >>> >>> Eelco >> >> >>Thanks for the explanation. >> >>Hypertheticaly speaking Fritz plays Rybka 1000 times and a rating for fritz is >>calulated based on the results of the games assuming Rybka's rating is known. >> >>Fritz then plays 1 game against 1000 engines with known ratings and a rating is >>calculated. Which rating would be nearer to Fritz's likely rating or would they >>be the same, hypertheticaly speaking. >> >>Regards >> >>Mike > >Hello Mike, > >The answer is still the same, if you want a rating that helps predict the >outcome against many different opponents, your method number two is much better. >Playing 1000 games against Rybka will tell you very well the strength of Fritz >versus Rybka but that is not what you want to know. If Rybka plays exactly as >well against Fritz as against all other engines, then your two answers will be >almost the same, apart from the chance deviations. The "Rock beats Scissors >beats Paper beats Rock"-effect may be little usually but especially in the >computerchess world it happens that one program has good results aginst one but >worse against another program which you would not expect upon rating alone. > >A secondary effect in practice is that although you know Rybka's rating with >small +/- range(in practice maybe not the best example since Rybka is still in >Beta stage) but say all 1000 engines including Rybka you could use have all >played a thousand games in the SSDF elolist and you use programs in the list as >opponents, their +/- ranges will change little anymore, but the average rating >of the thousand engines in the SSDF elolist is still more "steady" than the >rating of any single engine because there is almost no statistical deviation in >that average. If the SSDF list consists of just these thousand engines there is >zero statistical deviation in the average. But remember this number, average >SSDF rating, is only relative, if you want it to reflect strength against humans >for instance you have no other option than to play against humans too to >calibrate the SSDF against their human ratings. > >I still have the feeling I could express and understand this better, but maybe >someone else can do that! Thanks for the question! >Eelco Thanks for taking the time to answer. Regards Mike
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.