Author: Dann Corbit
Date: 23:27:17 01/11/01
Go up one level in this thread
On January 12, 2001 at 01:49:56, James T. Walker wrote: >On January 11, 2001 at 21:29:54, Dann Corbit wrote: > >>On January 11, 2001 at 20:48:44, James T. Walker wrote: >>>You seem to be the one who is emotional and irrational now. Why go off on the >>>deep end on something which is really simple. I was simply suggesting to take >>>the games played by top programs in the last year or so and consider them all as >>>one player. >> >>And I responded that this has no mathematical basis. There are many reasons >>why. Let me give one model to explain why. >>Fred writes faulty chess programs. All of them have a flaw that will be exposed >>over time. But he writes a new program each day. If you play Fred's programs >>once, you will be unlikely to find the flaw. If we play 365 games with Fred's >>programs against rated opponents, we will get a rating. But if we play just one >>of the programs against the same opponents we will get a wildly different >>rating. >> >>This model sounds silly. But if you are a computer programmer, you know that it >>actually models the true situation very well. >> >>Now, allow me to give a reasoning point. Some program such as Rebel or Hiarcs >>has tendencies. These tendencies could be studied and expoited. If I play a >>thousand games against one program I may learn a way to beat it. If I play a >>thousand games against a thousand programs, I am far less likely to learn a way >>to beat it. >> >>>It is perfectly logical to assume that if only one program is of GM >>>strength which many people claim is not, and you add the results of other >>>programs to the statistics, you are taking a worst case scenario. This is true >>>because the other programs surely are not GM strength if even 1 is not GM >>>strength. This might give you enough games combined to determine the "average" >>>strength of top programs today vs humans. Your main contention seems to be that >>>there is not enough data to determine what the strength of Rebel is but you >>>don't suggest how many games vs humans it would take to establish the fact one >>>way or the other. >> >>You will never prove it conclusively, but after a few hundred games you can >>offer a statistical argument. In the case of a super GM (e.g. 2600+ ELO) you >>could prove with a 2/3 probability that they were of GM (2500 ELO) strength >>after only one hundred games or so. The error bar would be about 100 and hence >>the odds that the center point was below 2500 would be established. >> >>>How many games does it take for a human to establish >>>himself/herself as equal to a GM in strength? >> >>I think that there are two questions here. >>1. What are the qualifications of a GM? >>This is answered by the bylaws of FIDE [or other governing body] >>2. How can we prove that someone is of GM strength? >>The second is answered when we can mathematically demonstrate within an agreed >>error bound that the ELO rating of a player must be at least 2500. >> >>Note that these are two different questions with two different answers. >> >>>What is GM strength? Maybe you >>>can come up with a number which would satisfy most people or at least yourself. >>>It's kind of like fuzzy logic. >> >>Let's use the definition of 2500 ELO against the same category of talent that is >>necessary to obtain a GM norm. The games must be at 40/2 and the games must be >>under tournament conditions. Indeed, a precise definition of what we are trying >>to prove is crucial to being able to prove it. >> >>>It becomes an easier and simpler way to arrive >>>at the answer without demanding you og exactly where you want to go on the first >>>try. It's obvious that computers will never hold a GM title because has made >>>this much more difficult for computers than humans. So the only thing I know to >>>do is to come up with some figures which most people agree is equal to a GM. If >>>you can't do this then you may never agree that computers are at last equal to a >>>GM even when computers are beating the pants off of GMs. >>>So what I was suggesting was to take the last X number of games by computers vs >>>GMs and treat them as one player. >> >>This is invalid. >> >>> If this "Average" computer is of GM strength >>>then seems to me we have some GM strength computers. >> >>How does one quantify "it seems to me" mathematically? >> >>>If they don't measure up >>>now then we have not proven that there are no GM computers but at least we prove >>>that as a whole they are not there yet. Of course you would want to chose the >>>best few computers which will give you enough games vs humans to establish yes >>>or no. (Not a C64) Say if it takes 40 or 50 games to satisfy you that computers >>>have reached Gm strength then use as many of the top computer vs human games you >>>need to get the 40 or 50 games. So the bottom line is if you can't decide how >>>many games it takes and what rating is equal to a GM then you will never answer >>>the question. >> >>The number of games is easily decidable, but is also a function of the >>competition. The better known the ELO of the competition, the more accurate >>will the rating be for the new player to be evaluated. If they have played >>thousands of rated games, then they will be supremely useful tools for that >>evaluation. If you look at the output of ELOSTAT (for instance) you will see a >>+ and a - figure for ELO value. That represents the error bar of the >>calculation for one standard deviation. That means that there is a 2/3 >>probability that the actual mean lies between those two values, and a 97% chance >>that it lies within a bar of double that width. >> >>> But if you can do that then maybe you can have the answer >>>already. >> >>Knowing how to formulate the question properly does not mean that we already >>have the answer, but it is a crucial first step. >> >>>Or maybe you're not interested in the answer but just like to argue. >> >>Passing judgement on someone's intent is always a sure sign that you have run >>out of useful arguements. I don't particularly like to argue, but if I think >>that someone is wrong, then I will say that I think they are wrong and I will >>tell the reasons why. >> >>I don't see anything particularly onerous or evil in that. > >Hello Dann, >I like some of your arguments. But you seem to want to keep redefining the >problem so that it can never be solved. The simple soulution would be to allow >a computer to compete for GM norms like any human and when it has the required >number of Norms/rating it could be declared a GM just like a human. And what's >is wrong with that? Nothing! Actually, I like your solution much better than mine. If that could be arranged, it would be ideal. But it answers the first question, not the second. The first question is, is a computer a GM? But it would not really answer whether it is of GM strength. To add to the perversion, I claim (furthermore) that a GM is not proven to be of GM strength either, until he has played enough games to demonstrate it mathematically. However, I will have to admit he/she IS a GM! >But you claim because computers have weaknesses >the humans must have time to find them and take advantage of them. I think this >also applies to humans which is why one of the requirements for a GM title is to >maintain a certain rating untill the title is awarded. I guess that's to keep >FIDE from looking stupid by awarding a GM title to someone who then drops to a >2300 level. (This has been done by the way) You seem to keep insisting on a >mathmatical certainty for computers not required of humans. To prove that the strength of a man is of GM strength would be equally arduous. To prove that a GM is a GM is nothing more than a table lookup. >There is no "better >known ELO" requirement for humans competing for the GM title. They play in >tournaments and take their chances with whoever happens to be there. When a >human is awarded the GM title he has no great mathmatical certainty that he is >of GM strength. AHA! My point exactly! > He has simply passed a test of strength set up by the FIDE >which in their opinion justifies the title. It's all very arbitrary but it's >the same for everybody (except computers). You also seem to the the argument >about taking the "average" computers rating by using several computers results >to give an ELO rating is invalid but give no reason why. Rember we are not >trying to award a GM title. We are just trying to determine if computers are >playing at the GM level within the last few months or whatever time needed to >calculate a reasonable rating which indicates GM strength. For the same reason we could not pick a sample of Spaniards playing squash to determine the ability of a Spaniard from that group at squash, we cannot create a mythical composite player and say that the findings apply to the atoms underneath. > Best results would >be the latest games played at the longest time controls available at the moment. > We are not trying to see if computers played at GM level 3 years ago. Your >main argument lies in the statistical probability of falling within a range >which is a silly rule to apply to computers when it is not applied to humans. With humans, I don't think anyone has ever even been asked to prove "Is Joe of GM strength?" The reason is that Joe is either a GM or he isn't. If he is a GM, we don't really care if he is of GM strength or not. We assume that he probably is. But being a GM does not prove that you are of GM strength. This is especailly true since the GM title is conferred for life. >So no matter what data is used you will always have the comeback that it is not >100% sure. I am not asking for 100% sure. I might even be satisfied with 1 standard deviation. Not sure, though. I would have to see the actual data. If the experiment were rigourous enough, then one standard deviation would probably be fairly convincing. Not absolute proof, of course, but some form of real proof. So far, we are not completely devoid of evidence, but we don't have enough to form a sound mathematical basis for a decision. >It's not a perfect world. Yes computers are moving targets because >they change every year. So are humans. NO I have not run out of useful >arguments yet. And please understand I am not arguing that computers are or are >not GM strength. I am only saying that perhaps we have enough data to determine >this if it is analyzed with some common sense and without artificial mathmatical >requirements put on it. In fact a good argument can be made that if you play X >number of games vs GMs and have an even result then you are of GM strength. Of >course the "X" is a big question but some reasonable men could come up with a >reasonable number. Statistical probability is not even required! If we want to prove something then is must be proven. If we are satisfied with assumptions, then so be it. I'm not. Perhaps you are. It will be proven to me when it becomes mathematically sound to believe it. Currently it is not.
This page took 0.01 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.