Author: Walter Koroljow
Date: 14:29:49 01/18/01
Go up one level in this thread
On January 18, 2001 at 07:44:13, Dann Corbit wrote: >On January 18, 2001 at 07:31:56, Walter Koroljow wrote: >[snip] >>By the way, I do not understand what a composite computer player is. > >Your rating is based upon repeated observations. This is required to form an >ELO number. But the conditions of each measurement are different. Different >programs or versions of programs. Different machines. The experiment has no >control and the data points are not valid for the conditions of ELO calculations >except as guesses and extrapolation. > >In order to calculate ELO, a magical machine opponent that does not exist has >been formed. It has fought Anand and gone to Dortmund and all sorts of exciting >things. However, this composite beast is a figment of the imagination because >it does not exist. > >All that having been said, if the machine were held constant and the program >held constant so that we actually had something stable to measure, I doubt if >the outcome would change any. Hmmm. I think you have a direct calculation in mind as is done by USCF. In that case, your observations make perfect sense. But my approach was an indirect proof, in which I assume something, show it leads to a bad prediction and reject the assumption. What is eventually left is the truth. I don't need the conditions you mention. I need other conditions, e.g., statistical independence of game results. Maybe I botched up my explanation, or there was too much mind-numbing detail. I'll take a shot at trying to explain and see if that gets us anywhere. Here is the essence of what I did. The major parts are here without the details. To simplify calculations, assume there is no such thing as a draw. Suppose we have a player rated 2200. John plays 80 games with Heinz and 80 games with Senior. Heinz and Senior are computers. The computers score 103 out of 160. We ask, "Could the computers both be rated 2250?" So we assume the computers are rated 2250 and simulate the 160 games a million times and keep track of the total scores we get. We find out that we get 102 or more less than 5% of the time. So we conclude that the computers' rating cannot be 2250. That would lead with high probability to a prediction in conflict with reality (a score of 103). Note that to come to this conclusion we do not need to know the playing conditions, the machines need not be identical, etc. All we need are assumed ratings, a true score, and some statistical assumptions (independence) allowing a calculation of probabilities of results. We ask, "Could the average rating of the computers be 2250 with a spread of 50?" That is, could one computer be 2200 and the other 2300. So we assume these ratings and simulate the 160 games another million times. We get 102 or more less than 5% of the time. So we conclude that the computers' average rating cannot be 2250 with a spread of 50. We try different spreads... . . . We ask, "Could the average rating of the computers be 2250 with a spread of 300?" That is, could one computer be 2100 and the other 2400. So we assume these ratings and simulate the 160 games another million times. This time we find that we get 100 or more less than 5% of the time. So we conclude that the computers' average rating cannot be 2250 with a spread of 300. But we know that the computers are of about equal strength, so we have looked at more than all the reasonable spreads in their ratings. We know that their average rating cannot be 2250 with any reasonable spread. So we conclude that their average rating cannot be 2250. We now try an average rating of 2260 with different spreads. If that doesn't work, we keep trying different average ratings until we come to the first one we cannot reject for some spread. This could be the truth, or the truth could be even higher (we don't know the true spread after all). This is the lower bound to the 95% confidence interval. The calculation is over. This is the basic logic of the calculation. I cannot see how a composite computer, etc. are involved. All I see is a probability distribution of results for two computers with assumed ratings. Does this make sense to you? Anyway, I am going to have supper and go to the chess club, where I haven't been in ages. Have a good evening. Walter
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.