Computer Chess Club Archives


Search

Terms

Messages

Subject: About head or tail (was Upon scientific truth - the nature of informati

Author: Ed Schröder

Date: 00:34:45 07/16/00


>posted by Dann Corbit on July 15, 2000 at 20:21:54:

>Simplifying.  I have a penny.
>I toss it twice.
>Heads, heads.
>I toss it twice
>Heads, heads.
>I toss it twice
>Tails, heads.
>I toss it twice
>Heads, tails.

>I count them up.

>Heads are stronger than tails.

>My conclusion is faulty.  Why?  Because I did not gather enough data.

Right.

A few months ago Christophe posted some interesting stuff here regarding
this topic and nobody really was in agreement with him (me included) until
I did an experiment which worked as an eye opener for me. The story is not
funny and goes like this...

In Rebel Century's Personalities you have the option [Strength of Play=100]
The value may vary from 1 to 100 and 100 is (of course) the default value.

Lowering this value will cause Rebel to lower its NPS. This opens the
possibility to create (100% equal!) engines with as only difference
they run SLOWER.

I was interested to know HOW MANY games it was needed to show that a 10%
faster version could beat a 10% slower version and with which numbers. So
I created  two personalities:

FAST.ENG (default settings) [Strength of Play=100]
SLOW.ENG (default settings) [Strength of Play=80]

and started to play 600 eng-eng games with Rebel's build-in autoplayer
with pre-defined fixed opening lines both engines had to play with white
and black.

The personality with as only change [Strength of Play=80] caused Rebel to
slow down with exactly 10% on the machine the marathon match took place.
Note that this value (80) may differ on other PC's in case you want to do
similar experiments.

Here are the results of the 600 games played between the FAST and SLOW
personalities. The first 300 games were played on a time control of "5
seconds average". The second 300 games were played on a time control of
"10 seconds average".

FAST - SLOW   162.5 - 137.5   [ 0:05 ]
FAST - SLOW   147.0 - 153.0   [ 0:10 ]

The first match of 300 games at 5-secs looks convincing. A 54.1% score
because of the 10% more speed seems a value one might expect.

But what the crazy result of match-2? Apparently after 300 games it is
still not enough to proof that the 10% faster version is superior (of
course it is) but the match score indicates both versions are equal
which is not true.

So how many games are needed to proof that version X is better than Y?

I am sure I am trying to reinvent the wheel. The casino guys who make
themselves a good living (with red and black) have figured it all out
centuries ago. Perhaps there is a FAQ somewhere on Internet that
explains how many times you have to turn the wheel to get an exact
50.0% division between red and black. 1000? 2000?

To answer this question I wrote a little program that randomly emulates
chess matches. It shows that 100 games is nothing, too often scores like
60-40 appear on the screen. 500 games (and higher) seems to do well as
most of the time match scores fall within the 49.0 - 51.0 area.

The bad news (in any case for me) is that it hardly makes any sense to
test candidate program improvements using (even) long matches. Back to
common sense: 10% = 10% = better. Oh well...

Ed



This page took 0.01 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.