Author: Ed Schröder
Date: 00:34:45 07/16/00
>posted by Dann Corbit on July 15, 2000 at 20:21:54: >Simplifying. I have a penny. >I toss it twice. >Heads, heads. >I toss it twice >Heads, heads. >I toss it twice >Tails, heads. >I toss it twice >Heads, tails. >I count them up. >Heads are stronger than tails. >My conclusion is faulty. Why? Because I did not gather enough data. Right. A few months ago Christophe posted some interesting stuff here regarding this topic and nobody really was in agreement with him (me included) until I did an experiment which worked as an eye opener for me. The story is not funny and goes like this... In Rebel Century's Personalities you have the option [Strength of Play=100] The value may vary from 1 to 100 and 100 is (of course) the default value. Lowering this value will cause Rebel to lower its NPS. This opens the possibility to create (100% equal!) engines with as only difference they run SLOWER. I was interested to know HOW MANY games it was needed to show that a 10% faster version could beat a 10% slower version and with which numbers. So I created two personalities: FAST.ENG (default settings) [Strength of Play=100] SLOW.ENG (default settings) [Strength of Play=80] and started to play 600 eng-eng games with Rebel's build-in autoplayer with pre-defined fixed opening lines both engines had to play with white and black. The personality with as only change [Strength of Play=80] caused Rebel to slow down with exactly 10% on the machine the marathon match took place. Note that this value (80) may differ on other PC's in case you want to do similar experiments. Here are the results of the 600 games played between the FAST and SLOW personalities. The first 300 games were played on a time control of "5 seconds average". The second 300 games were played on a time control of "10 seconds average". FAST - SLOW 162.5 - 137.5 [ 0:05 ] FAST - SLOW 147.0 - 153.0 [ 0:10 ] The first match of 300 games at 5-secs looks convincing. A 54.1% score because of the 10% more speed seems a value one might expect. But what the crazy result of match-2? Apparently after 300 games it is still not enough to proof that the 10% faster version is superior (of course it is) but the match score indicates both versions are equal which is not true. So how many games are needed to proof that version X is better than Y? I am sure I am trying to reinvent the wheel. The casino guys who make themselves a good living (with red and black) have figured it all out centuries ago. Perhaps there is a FAQ somewhere on Internet that explains how many times you have to turn the wheel to get an exact 50.0% division between red and black. 1000? 2000? To answer this question I wrote a little program that randomly emulates chess matches. It shows that 100 games is nothing, too often scores like 60-40 appear on the screen. 500 games (and higher) seems to do well as most of the time match scores fall within the 49.0 - 51.0 area. The bad news (in any case for me) is that it hardly makes any sense to test candidate program improvements using (even) long matches. Back to common sense: 10% = 10% = better. Oh well... Ed
This page took 0.01 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.