Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: About head or tail (was Upon scientific truth - the nature of informati

Author: Ed Schröder

Date: 04:15:45 07/17/00

Go up one level in this thread


On July 17, 2000 at 06:18:38, Harald Faber wrote:

>On July 16, 2000 at 17:56:22, Ed Schröder wrote:
>
>>On July 16, 2000 at 05:30:48, Harald Faber wrote:
>>
>>>On July 16, 2000 at 03:34:45, Ed Schröder wrote:
>>>
>>>>>posted by Dann Corbit on July 15, 2000 at 20:21:54:
>>>>
>>>>>Simplifying.  I have a penny.
>>>>>I toss it twice.
>>>>>Heads, heads.
>>>>>I toss it twice
>>>>>Heads, heads.
>>>>>I toss it twice
>>>>>Tails, heads.
>>>>>I toss it twice
>>>>>Heads, tails.
>>>>
>>>>>I count them up.
>>>>
>>>>>Heads are stronger than tails.
>>>>
>>>>>My conclusion is faulty.  Why?  Because I did not gather enough data.
>>>>
>>>>Right.
>>>>
>>>>A few months ago Christophe posted some interesting stuff here regarding
>>>>this topic and nobody really was in agreement with him (me included) until
>>>>I did an experiment which worked as an eye opener for me. The story is not
>>>>funny and goes like this...
>>>>
>>>>In Rebel Century's Personalities you have the option [Strength of Play=100]
>>>>The value may vary from 1 to 100 and 100 is (of course) the default value.
>>>>
>>>>Lowering this value will cause Rebel to lower its NPS. This opens the
>>>>possibility to create (100% equal!) engines with as only difference
>>>>they run SLOWER.
>>>>
>>>>I was interested to know HOW MANY games it was needed to show that a 10%
>>>>faster version could beat a 10% slower version and with which numbers. So
>>>>I created  two personalities:
>>>>
>>>>FAST.ENG (default settings) [Strength of Play=100]
>>>>SLOW.ENG (default settings) [Strength of Play=80]
>>>>
>>>>and started to play 600 eng-eng games with Rebel's build-in autoplayer
>>>>with pre-defined fixed opening lines both engines had to play with white
>>>>and black.
>>>>
>>>>The personality with as only change [Strength of Play=80] caused Rebel to
>>>>slow down with exactly 10% on the machine the marathon match took place.
>>>>Note that this value (80) may differ on other PC's in case you want to do
>>>>similar experiments.
>>>>
>>>>Here are the results of the 600 games played between the FAST and SLOW
>>>>personalities. The first 300 games were played on a time control of "5
>>>>seconds average". The second 300 games were played on a time control of
>>>>"10 seconds average".
>>>>
>>>>FAST - SLOW   162.5 - 137.5   [ 0:05 ]
>>>>FAST - SLOW   147.0 - 153.0   [ 0:10 ]
>>>>
>>>>The first match of 300 games at 5-secs looks convincing. A 54.1% score
>>>>because of the 10% more speed seems a value one might expect.
>>>>
>>>>But what the crazy result of match-2? Apparently after 300 games it is
>>>>still not enough to proof that the 10% faster version is superior (of
>>>>course it is) but the match score indicates both versions are equal
>>>>which is not true.
>>>>
>>>>So how many games are needed to proof that version X is better than Y?
>>>>
>>>>I am sure I am trying to reinvent the wheel. The casino guys who make
>>>>themselves a good living (with red and black) have figured it all out
>>>>centuries ago. Perhaps there is a FAQ somewhere on Internet that
>>>>explains how many times you have to turn the wheel to get an exact
>>>>50.0% division between red and black. 1000? 2000?
>>>>
>>>>To answer this question I wrote a little program that randomly emulates
>>>>chess matches. It shows that 100 games is nothing, too often scores like
>>>>60-40 appear on the screen. 500 games (and higher) seems to do well as
>>>>most of the time match scores fall within the 49.0 - 51.0 area.
>>>>
>>>>The bad news (in any case for me) is that it hardly makes any sense to
>>>>test candidate program improvements using (even) long matches. Back to
>>>>common sense: 10% = 10% = better. Oh well...
>>>>
>>>>Ed
>>>
>>>This is exactly what I praise for ages.
>>>500 games show a tendency. If you get a 70-30 result by playing 500 games it is
>>>unlikely that the 30-program is stronger than the 70-program. But the other
>>>question is if the 70-program is really stronger or will it decrease to the
>>>50%-area? Or even worse, you get a 55-45 result...Finally in computer matches
>>>there are wide opening books. So your first 10 games might never be repeated. >Or you play another 10-game match and get a completely different result than in
>>>your first 10 game match because of different opening lines...
>>
>>>So what to do to verify improvements or to get an idea if program a is stronger
>>>than program b? I don't know.
>>
>>In the early days of a chess programmer it is easy but when your program
>>is over 2300-2400 it becomes very difficult to judge a candidate program
>>improvement. Personally I use a main set of 70-100 positions (frequently
>>updated) which are tested manually first then a large set of >500 positions
>>that runs automatically that produces a detailed report and database of
>>every difference in regard to the previous version. If results are good
>>then an engine-engine 300 game match is done as described above. In a
>>later stadium (after a couple of program changes) some auto232 matches
>>are played. The latter is of minor importance (in respect to the changes)
>>as too much randomness is involved (book, learning). In the end my feeling
>>on a program change is the decisive factor.
>
>
>Anyway this is a very time spending task.

That's why most of us need a full year if you know what I mean.


>>>Playing 1000 games with tournament time control
>>>takes much too much time. Test positions don't reflect practical play.
>>>I really have no clue.
>>
>>>And that is why I always say thet the top-10 (!) programs
>>>play at equal strength.
>>
>>That's a bold statement.
>>
>>Ed
>
>I know. Prove me wrong. :-)

How about a 10 game match....?

Ed



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.