Computer Chess Club Archives


Search

Terms

Messages

Subject: Testmethods for n=0, n=1 and n=>800 - For Beginners and 'old Hands'

Author: Rolf Tueschen

Date: 06:20:26 09/13/02


Computerchess (CC) can be boring if one loses the track of chess. Then it can
become a mere application of programming and computer sciences. Good enough for
the talented but not enough for the wealth of chess. The occupation with
computers and the programs and plays has the same addictive aspects like motion
pictures and television, only that you can actively participate. With the
internet there is finally a whole virtual reality existing.

Who ever became acquainted with chess must by force land in CC. With CC and the
chess programs the myst of human chess is gone. Even as a beginner or as master
you were collecting games and analyses. But no matter how hard you tried, you
could fill thousands of index cards, but you couldn't collect a couple of
millions of games. Today, what is played miles away is on your display seconds
later. Oversights are detected in seconds with the actual software.

Now - it has a tradition of sports in the Anglo-American world. Closely
connected with the science of measuring and counting who's the best. May I
inform western readers that in the tradition of the ancient old East it's more
about personal life and personal training to get some individual perfection, no
matter how this differed from the perfection of others. Anyway, because it's
very common to build up ranking lists, the same took place in CC.

Let's quickly compare human lists and computer rankings. The Elo method allows
to calculate the individual strength (performance) over the variable of age. In
CC programs have no age at all, because almost each new version gets completely
new limbs and organs so to speak. That means that you can't compare the old and
the new version. Or would you compare the embryo with M. Dos Savant?  We
remember the old saying "You can't compare apples with beans". Nevertheless CC
has ranking lists for decades now with the astonishing result that the newest
progs are on top and the oldest, on the weakest hardware, are at the bottom. Big
surprise!

Now the industry wants to know if its newest "babies" were at least a little bit
"better" than the former version. Not that it mattered, but PR needs a minimum
of authenticity.

So how would you measure "better" and how much is better? What is exactitude in
such a fuzzy world like chess? Chess is comparable with differential mathematics
because there's no 'finite' until the game has been solved. And don't forget
there are more chess moves than atoms in our World! Don't hold your breath that
chess could be solved next week. Won't happen in a lifetime.

So, I repeat, how do you want to measure and calculate in chess? Isn't chess the
game with always new discoveries in almost every new game? How many games you
must run to know which version is stronger than its predecessor? 0, 1 or over
800?

The answer is short. No matter how many games you run, or even if you'd run no
game at all, you get results. Here they are:

With n=0:

I know for _sure_ that the next version is always stronger than the former.
Of course 5% incertidudiness included! Take my bet?

With n=1:

Here we have Thorsten. Testing chess machines for two decades now since he was
13 or such. Of course he _knows_ what version is stronger after a game or two!
Again, the 5% included!

With n=>800:

Ed Schroder has a dozen machines and lets them play autoplayed games. Could he
get exact results? Not really, but probably he knows what he wants to know after
800 games. But - also Ed has a risk of 5% or 2%.

Is this another chapter of the evil's black humour? No, not really! I wanted to
show you that even with the most sensitive statistics you can't get certitude in
chess. And let me tell you that statistics is simply not made for chess. This is
like maths in astrology instead of astronomy. Period.

Take a 100 m final in athletics. Now either someone is visibly faster then he's
the best. The moment you can't decide with your own eyes who's the winner, there
is no winner at all no matter how many digits you are defining. As humans we
don't take the one runner with two nano seconds less as the "best"! We say
simply that they are equally strong. And that should be remembered in CC too. If
you get a result of 52-48 then the two progs are equally strong. And no voodoo
with statistics could bring more clarity. And 720 to 680 is - in chess with
computers - also almost equally strong. You can't get automatically "better"
results in CC with simply raising the n. Why? Because the whole thing with
statistics is the underlying distribution. Strength should be a normal
distribution, but it isn't in CC. In CC almost all depends on hardware. The rest
is so minimal that you can't detect it statistically.
(Another important aspect is the Law of the Constance of the variables exception
the one you want to measure. But I don't want to confuse too much.)

Rolf Tueschen




This page took 0.01 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.