Author: Harald Faber
Date: 02:30:48 07/16/00
Go up one level in this thread
On July 16, 2000 at 03:34:45, Ed Schröder wrote: >>posted by Dann Corbit on July 15, 2000 at 20:21:54: > >>Simplifying. I have a penny. >>I toss it twice. >>Heads, heads. >>I toss it twice >>Heads, heads. >>I toss it twice >>Tails, heads. >>I toss it twice >>Heads, tails. > >>I count them up. > >>Heads are stronger than tails. > >>My conclusion is faulty. Why? Because I did not gather enough data. > >Right. > >A few months ago Christophe posted some interesting stuff here regarding >this topic and nobody really was in agreement with him (me included) until >I did an experiment which worked as an eye opener for me. The story is not >funny and goes like this... > >In Rebel Century's Personalities you have the option [Strength of Play=100] >The value may vary from 1 to 100 and 100 is (of course) the default value. > >Lowering this value will cause Rebel to lower its NPS. This opens the >possibility to create (100% equal!) engines with as only difference >they run SLOWER. > >I was interested to know HOW MANY games it was needed to show that a 10% >faster version could beat a 10% slower version and with which numbers. So >I created two personalities: > >FAST.ENG (default settings) [Strength of Play=100] >SLOW.ENG (default settings) [Strength of Play=80] > >and started to play 600 eng-eng games with Rebel's build-in autoplayer >with pre-defined fixed opening lines both engines had to play with white >and black. > >The personality with as only change [Strength of Play=80] caused Rebel to >slow down with exactly 10% on the machine the marathon match took place. >Note that this value (80) may differ on other PC's in case you want to do >similar experiments. > >Here are the results of the 600 games played between the FAST and SLOW >personalities. The first 300 games were played on a time control of "5 >seconds average". The second 300 games were played on a time control of >"10 seconds average". > >FAST - SLOW 162.5 - 137.5 [ 0:05 ] >FAST - SLOW 147.0 - 153.0 [ 0:10 ] > >The first match of 300 games at 5-secs looks convincing. A 54.1% score >because of the 10% more speed seems a value one might expect. > >But what the crazy result of match-2? Apparently after 300 games it is >still not enough to proof that the 10% faster version is superior (of >course it is) but the match score indicates both versions are equal >which is not true. > >So how many games are needed to proof that version X is better than Y? > >I am sure I am trying to reinvent the wheel. The casino guys who make >themselves a good living (with red and black) have figured it all out >centuries ago. Perhaps there is a FAQ somewhere on Internet that >explains how many times you have to turn the wheel to get an exact >50.0% division between red and black. 1000? 2000? > >To answer this question I wrote a little program that randomly emulates >chess matches. It shows that 100 games is nothing, too often scores like >60-40 appear on the screen. 500 games (and higher) seems to do well as >most of the time match scores fall within the 49.0 - 51.0 area. > >The bad news (in any case for me) is that it hardly makes any sense to >test candidate program improvements using (even) long matches. Back to >common sense: 10% = 10% = better. Oh well... > >Ed This is exactly what I praise for ages. 500 games show a tendency. If you get a 70-30 result by playing 500 games it is unlikely that the 30-program is stronger than the 70-program. But the other question is if the 70-program is really stronger or will it decrease to the 50%-area? Or even worse, you get a 55-45 result...Finally in computer matches there are wide opening books. So your first 10 games might never be repeated. Or you play another 10-game match and get a completely different result than in your first 10 game match because of different opening lines... So what to do to verify improvements or to get an idea if program a is stronger than program b? I don't know. Playing 1000 games with tournament time control takes much too much time. Test positions don't reflect practical play. I really have no clue. And that is why I always say thet the top-10 (!) programs play at equal strength.
This page took 0.01 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.