Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: The luck element in humans & programs

Author: Dann Corbit

Date: 17:09:01 10/21/98

Go up one level in this thread


On October 21, 1998 at 18:20:10, Fernando Villegas wrote:
>Hi dan:
>Very smarts comments you have made to probe your point, but maybe there is
>another angle to see this issue about the meaning of champion of anything. The
>clue, I suppose, is the difference between to be proclaimed champion in a
>determinate event and to be the very best. The first thing happens in a singular
>event, like to play the final game in the soccer championship or to win or lose
>against Deep Blue in an specific match of only half a dozen games. But also when
>we talk of a champion we are not just makinfg reference to the guy that got the
>cup, but just to the performer that in average has a better perfomance than the
>competence. In this last sense statisc results are the core of the matter and
>surely the statistics asociated to human beings are so good or bad to that than
>the statistics asociated with chess computers. We tend to forget that when we
>clasify a chess player as GM or IM we are not saying that he got a title of such
>kind in this or that tournament, BUT that he has such rating and title after
>hundred, perhaps thousands of games.
This is actually the main point that I was driving at.  Our confidence in the
ability of a champion of any sort from a *mathematical* standpoint is a function
of the number of measurements we have taken.  So, for instance, I could say with
99.999% certainty that Kasparov is better than a player with thousands of games
who is rated at ELO 1000.  We can say with (perhaps) 90% certainty that he is
better than Anand (just a guess really, because I have not attempted any math).
In other words, we can use a huge pool of measurements to increase our
certainty/confidence in a hypothesis.  What I have been wanting to demonstrate
has to do with this:
Scenario: "Person X buys progray Y.  He already has program Z.  He has two
machines, A & B.  He runs X on A and Y on B in a mini-tournament of ten games.
The result is in favor of X, and he announces that program X is stronger."

I simply want to point out that such findings are not scientific.  Even a 10:0
result is not conclusive, scientific evidence that A is stronger than B.  People
seem to think that measuring chess games between computers is somehow completely
different from measuring coin flips or the ages of people in a room or other
phenomena.

>Anothet things we forget -it seems to me Smir forgot it - is that strenght is
>something very different to relative force, such as that measured by Elo
>ratings. Strenght could be and surely is permanent, as Amir say, but not so the
>rating because thais last one depends of a relation of forces with changeable
>oponents. It is not matter of you changing your strenght, but also how the
>oposition change yours. That's the reason computers that in the middle of the
>80's had a 2000 elo now appear with a very much degraded one; they now compete
>with a lot stronger programs.
I agree that programs and machines are clearly stronger than they used to be.
Algorithmic and hardware advancements will always march the computer chess game
forward.

I also want to point out that I am not saying that pronouncements are *wrong*
either, it is just that they are uncertain.  Obviously, a 10:0 margin would lend
a lot of credence to program A being stronger.  If it really *is* stronger, then
repeated experiments would bear this out.  Until we have repeated the experiment
many times, we don't really know -- even though each time it becomes more
certain.  On the other hand, it may have found the sole weakness in a program
that learns.  After 10 butt-whuppings, it plugs the hole, and from there forward
never loses to A again.



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.