Author: Fernando Villegas
Date: 09:21:50 10/22/98
Go up one level in this thread
On October 21, 1998 at 20:09:01, Dann Corbit wrote: >On October 21, 1998 at 18:20:10, Fernando Villegas wrote: >>Hi dan: >>Very smarts comments you have made to probe your point, but maybe there is >>another angle to see this issue about the meaning of champion of anything. The >>clue, I suppose, is the difference between to be proclaimed champion in a >>determinate event and to be the very best. The first thing happens in a singular >>event, like to play the final game in the soccer championship or to win or lose >>against Deep Blue in an specific match of only half a dozen games. But also when >>we talk of a champion we are not just makinfg reference to the guy that got the >>cup, but just to the performer that in average has a better perfomance than the >>competence. In this last sense statisc results are the core of the matter and >>surely the statistics asociated to human beings are so good or bad to that than >>the statistics asociated with chess computers. We tend to forget that when we >>clasify a chess player as GM or IM we are not saying that he got a title of such >>kind in this or that tournament, BUT that he has such rating and title after >>hundred, perhaps thousands of games. >This is actually the main point that I was driving at. Our confidence in the >ability of a champion of any sort from a *mathematical* standpoint is a function >of the number of measurements we have taken. So, for instance, I could say with >99.999% certainty that Kasparov is better than a player with thousands of games >who is rated at ELO 1000. We can say with (perhaps) 90% certainty that he is >better than Anand (just a guess really, because I have not attempted any math). >In other words, we can use a huge pool of measurements to increase our >certainty/confidence in a hypothesis. What I have been wanting to demonstrate >has to do with this: >Scenario: "Person X buys progray Y. He already has program Z. He has two >machines, A & B. He runs X on A and Y on B in a mini-tournament of ten games. >The result is in favor of X, and he announces that program X is stronger." > >I simply want to point out that such findings are not scientific. Even a 10:0 >result is not conclusive, scientific evidence that A is stronger than B. People >seem to think that measuring chess games between computers is somehow completely >different from measuring coin flips or the ages of people in a room or other >phenomena. > >>Anothet things we forget -it seems to me Smir forgot it - is that strenght is >>something very different to relative force, such as that measured by Elo >>ratings. Strenght could be and surely is permanent, as Amir say, but not so the >>rating because thais last one depends of a relation of forces with changeable >>oponents. It is not matter of you changing your strenght, but also how the >>oposition change yours. That's the reason computers that in the middle of the >>80's had a 2000 elo now appear with a very much degraded one; they now compete >>with a lot stronger programs. >I agree that programs and machines are clearly stronger than they used to be. >Algorithmic and hardware advancements will always march the computer chess game >forward. > >I also want to point out that I am not saying that pronouncements are *wrong* >either, it is just that they are uncertain. Obviously, a 10:0 margin would lend >a lot of credence to program A being stronger. If it really *is* stronger, then >repeated experiments would bear this out. Until we have repeated the experiment >many times, we don't really know -- even though each time it becomes more >certain. On the other hand, it may have found the sole weakness in a program >that learns. After 10 butt-whuppings, it plugs the hole, and from there forward >never loses to A again. You are right, but let me add just another thing. We not only get results in a 10 games match, but also we get moves and we can take a look at them. Loking at them sometime just a game suffices to see which player is stronger. You can be faced before two masked players and I am sure you will identify very soon who is the strongest and better player just because the quaslity of the moves, IF a perceptible difference exist, of course. In that sense I tend to believe what Thorten said many times about his capacity to measure a program strenght just looking his moves is maybe an exagerated sentence, but there is a point of truth in that. Precisely because results of a game are not random but obey to the intrinsic qualities of the player, we should not take this just as the experiment with the coin. In other words: a program is not stronger that another BECAUSE got 85% of the points, but got 85% or in average will get that number of wins because play better moves. Of course, this is increasingly difficult to measure as much programs are near to each other, but adding statistics results to qualitative observation of moves i am sure we can get some very relevants conclusion not a lot beyond ten games. Fernando
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.