Author: Dann Corbit
Date: 17:09:01 10/21/98
Go up one level in this thread
On October 21, 1998 at 18:20:10, Fernando Villegas wrote: >Hi dan: >Very smarts comments you have made to probe your point, but maybe there is >another angle to see this issue about the meaning of champion of anything. The >clue, I suppose, is the difference between to be proclaimed champion in a >determinate event and to be the very best. The first thing happens in a singular >event, like to play the final game in the soccer championship or to win or lose >against Deep Blue in an specific match of only half a dozen games. But also when >we talk of a champion we are not just makinfg reference to the guy that got the >cup, but just to the performer that in average has a better perfomance than the >competence. In this last sense statisc results are the core of the matter and >surely the statistics asociated to human beings are so good or bad to that than >the statistics asociated with chess computers. We tend to forget that when we >clasify a chess player as GM or IM we are not saying that he got a title of such >kind in this or that tournament, BUT that he has such rating and title after >hundred, perhaps thousands of games. This is actually the main point that I was driving at. Our confidence in the ability of a champion of any sort from a *mathematical* standpoint is a function of the number of measurements we have taken. So, for instance, I could say with 99.999% certainty that Kasparov is better than a player with thousands of games who is rated at ELO 1000. We can say with (perhaps) 90% certainty that he is better than Anand (just a guess really, because I have not attempted any math). In other words, we can use a huge pool of measurements to increase our certainty/confidence in a hypothesis. What I have been wanting to demonstrate has to do with this: Scenario: "Person X buys progray Y. He already has program Z. He has two machines, A & B. He runs X on A and Y on B in a mini-tournament of ten games. The result is in favor of X, and he announces that program X is stronger." I simply want to point out that such findings are not scientific. Even a 10:0 result is not conclusive, scientific evidence that A is stronger than B. People seem to think that measuring chess games between computers is somehow completely different from measuring coin flips or the ages of people in a room or other phenomena. >Anothet things we forget -it seems to me Smir forgot it - is that strenght is >something very different to relative force, such as that measured by Elo >ratings. Strenght could be and surely is permanent, as Amir say, but not so the >rating because thais last one depends of a relation of forces with changeable >oponents. It is not matter of you changing your strenght, but also how the >oposition change yours. That's the reason computers that in the middle of the >80's had a 2000 elo now appear with a very much degraded one; they now compete >with a lot stronger programs. I agree that programs and machines are clearly stronger than they used to be. Algorithmic and hardware advancements will always march the computer chess game forward. I also want to point out that I am not saying that pronouncements are *wrong* either, it is just that they are uncertain. Obviously, a 10:0 margin would lend a lot of credence to program A being stronger. If it really *is* stronger, then repeated experiments would bear this out. Until we have repeated the experiment many times, we don't really know -- even though each time it becomes more certain. On the other hand, it may have found the sole weakness in a program that learns. After 10 butt-whuppings, it plugs the hole, and from there forward never loses to A again.
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.