Author: Christophe Theron
Date: 11:44:48 01/28/00
Go up one level in this thread
On January 28, 2000 at 13:01:05, Michael Neish wrote: >On January 28, 2000 at 07:27:54, Enrique Irazoqui wrote: > >>There is a degree of uncertainty, but I don't think you need 1000 matches of 200 >>games each to have an idea of who is best. > >I have to agree with what Christophe says insofar as you need to play a >certain number of games before you can determine, to a certain (known) >degree of accuracy what the rating difference is between two programs. >You will never know exactly of course, hence the standard deviation >figures given next to the Elo ratings of human Chess players, which are >sometimes overlooked. I haven't read Elo's book, but from what I know >of the Elo system he must have taken all this probability stuff into >account when he formulated it, so meaningless it is not. In fact, it is >the core of the entire system. > >If the rating difference between two programs is quite small, say less than >35 points, then I'm afraid you will definitely need a lot of games to sort it >out from the results alone. A 20-game match solves nothing. Christophe, >if you're reading this, could you tell us what is the minimum Elo difference >that a 20-game match can estimate to a good degree of confidence? With a 20 games match, you can determine if prog A is 77 elo points above prog B, with a 80% confidence. If the programs are closer in ELO, the 20 games match is not enough. I have answered to Enrique and given a complete table in my answer, you might find it very interesting. The key point is that when the elo difference gets smaller, the number of games to play increases tremendously. >The assertions that I made in my original post, which Christophe commented >on, were that even with programs of equal strength you can expect the sort of >fluctuations that I showed. This is an inescapable fact. You could try >playing Fritz 6a against Fritz 6a twenty times and see what happens. You >have about a chance in six of getting a 10-10 score, even though it's the >same program. In fact, you're much more likely to get a an 11-9 score, or >even 12-8. > >>Kasparov has been the undisputed best for many years. From 1984 until now, he >>played a total of 772 rated games. He needed less than half these games to >>convince everyone about who is the best chess player. >> >>This makes more sense to me than the probability stuff of your Qbasic program. >>Otherwise we would reach the absurd of believing that all the rankings in the >>history of chess are meaningless, and Capablanca, Fischer and Kasparov had long >>streaks of luck. > >By the way Enrique (just in case you thought so), I hope you haven't taken these >posts as an attack on the Cadaques tournament, which I, and many other people, >are very interested in, and which must be hard work for those responsible. I >wrote >"Cadaques" in the title as an eye-catcher, as it was topical, and I wished to >express my views once again on the fact that you cannot ignore the natural >fluctuations that occur when small numbers of games are played. > >Anyway, humans are one thing and computers are another. > >Humans are subject to moods, good spells, bad spells, psychological warfare, you >name it. Draw a lost game and your confidence soars. Lose a won game and it >plummets. Computers, as you know very well, are not susceptible to this sort of >thing. They play at the same level day or night, win or lose. But I'm afraid >they >are still subject to the basic laws of probability, whatever one says, and >however >passionately one may say it. In fact, their invariability means that my >Christophe's >"probability" program applies to them far better than it does to humans. > >I guess people who take the time to examine Kasparov's games can appreciate >that he is a fine player. If you look at his results, of course they are good, >but >you don't get the full picture. The same with computer games I suppose. >Kasparov has played fewer than 200,000 games of course, but then again >there is quite a gap between himself and the next man. The Elo rating reflects >this. Now who's a better player, Morozevich or Leko? If they play a 20-game >match and, say, Leko wins 15-5, is it conclusively proved that Leko is >better? Maybe he's just good at psyching out his opponent and induce him >to play worse than usual. Maybe this should be considered as one aspect of >Chess skill. > >There are a certain number of games you need to play to be reasonably sure >of who is best. The closer the players are in strength, naturally the more >games they will need to play, right? It shouldn't be difficult to work it out >-- I've never had reason to do so. In fact, the number of games can be >worked out exactly. Maths may be drab, boring, a drag, a conversation killer, >call it what you want, but it's there whether one likes it or not, and its >effects pervade the whole business of Chess and match playing. You sit >at the board to play Your Game; you think you are in control of the situation, >but your performance at the board is subject to the same laws regardless. > >Cheers, > >Mike.
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.