Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: About head or tail (was Upon scientific truth - the nature of informati

Author: Ed Schröder

Date: 11:15:54 07/17/00

Go up one level in this thread


On July 17, 2000 at 07:33:41, Enrique Irazoqui wrote:

>On July 17, 2000 at 07:15:45, Ed Schröder wrote:
>
>>On July 17, 2000 at 06:18:38, Harald Faber wrote:
>>
>>>On July 16, 2000 at 17:56:22, Ed Schröder wrote:
>>>
>>>>On July 16, 2000 at 05:30:48, Harald Faber wrote:
>>>>
>>>>>On July 16, 2000 at 03:34:45, Ed Schröder wrote:
>>>>>
>>>>>>>posted by Dann Corbit on July 15, 2000 at 20:21:54:
>>>>>>
>>>>>>>Simplifying.  I have a penny.
>>>>>>>I toss it twice.
>>>>>>>Heads, heads.
>>>>>>>I toss it twice
>>>>>>>Heads, heads.
>>>>>>>I toss it twice
>>>>>>>Tails, heads.
>>>>>>>I toss it twice
>>>>>>>Heads, tails.
>>>>>>
>>>>>>>I count them up.
>>>>>>
>>>>>>>Heads are stronger than tails.
>>>>>>
>>>>>>>My conclusion is faulty.  Why?  Because I did not gather enough data.
>>>>>>
>>>>>>Right.
>>>>>>
>>>>>>A few months ago Christophe posted some interesting stuff here regarding
>>>>>>this topic and nobody really was in agreement with him (me included) until
>>>>>>I did an experiment which worked as an eye opener for me. The story is not
>>>>>>funny and goes like this...
>>>>>>
>>>>>>In Rebel Century's Personalities you have the option [Strength of Play=100]
>>>>>>The value may vary from 1 to 100 and 100 is (of course) the default value.
>>>>>>
>>>>>>Lowering this value will cause Rebel to lower its NPS. This opens the
>>>>>>possibility to create (100% equal!) engines with as only difference
>>>>>>they run SLOWER.
>>>>>>
>>>>>>I was interested to know HOW MANY games it was needed to show that a 10%
>>>>>>faster version could beat a 10% slower version and with which numbers. So
>>>>>>I created  two personalities:
>>>>>>
>>>>>>FAST.ENG (default settings) [Strength of Play=100]
>>>>>>SLOW.ENG (default settings) [Strength of Play=80]
>>>>>>
>>>>>>and started to play 600 eng-eng games with Rebel's build-in autoplayer
>>>>>>with pre-defined fixed opening lines both engines had to play with white
>>>>>>and black.
>>>>>>
>>>>>>The personality with as only change [Strength of Play=80] caused Rebel to
>>>>>>slow down with exactly 10% on the machine the marathon match took place.
>>>>>>Note that this value (80) may differ on other PC's in case you want to do
>>>>>>similar experiments.
>>>>>>
>>>>>>Here are the results of the 600 games played between the FAST and SLOW
>>>>>>personalities. The first 300 games were played on a time control of "5
>>>>>>seconds average". The second 300 games were played on a time control of
>>>>>>"10 seconds average".
>>>>>>
>>>>>>FAST - SLOW   162.5 - 137.5   [ 0:05 ]
>>>>>>FAST - SLOW   147.0 - 153.0   [ 0:10 ]
>>>>>>
>>>>>>The first match of 300 games at 5-secs looks convincing. A 54.1% score
>>>>>>because of the 10% more speed seems a value one might expect.
>>>>>>
>>>>>>But what the crazy result of match-2? Apparently after 300 games it is
>>>>>>still not enough to proof that the 10% faster version is superior (of
>>>>>>course it is) but the match score indicates both versions are equal
>>>>>>which is not true.
>>>>>>
>>>>>>So how many games are needed to proof that version X is better than Y?
>>>>>>
>>>>>>I am sure I am trying to reinvent the wheel. The casino guys who make
>>>>>>themselves a good living (with red and black) have figured it all out
>>>>>>centuries ago. Perhaps there is a FAQ somewhere on Internet that
>>>>>>explains how many times you have to turn the wheel to get an exact
>>>>>>50.0% division between red and black. 1000? 2000?
>>>>>>
>>>>>>To answer this question I wrote a little program that randomly emulates
>>>>>>chess matches. It shows that 100 games is nothing, too often scores like
>>>>>>60-40 appear on the screen. 500 games (and higher) seems to do well as
>>>>>>most of the time match scores fall within the 49.0 - 51.0 area.
>>>>>>
>>>>>>The bad news (in any case for me) is that it hardly makes any sense to
>>>>>>test candidate program improvements using (even) long matches. Back to
>>>>>>common sense: 10% = 10% = better. Oh well...
>>>>>>
>>>>>>Ed
>>>>>
>>>>>This is exactly what I praise for ages.
>>>>>500 games show a tendency. If you get a 70-30 result by playing 500 games it is
>>>>>unlikely that the 30-program is stronger than the 70-program. But the other
>>>>>question is if the 70-program is really stronger or will it decrease to the
>>>>>50%-area? Or even worse, you get a 55-45 result...Finally in computer matches
>>>>>there are wide opening books. So your first 10 games might never be repeated. >Or you play another 10-game match and get a completely different result than in
>>>>>your first 10 game match because of different opening lines...
>>>>
>>>>>So what to do to verify improvements or to get an idea if program a is stronger
>>>>>than program b? I don't know.
>>>>
>>>>In the early days of a chess programmer it is easy but when your program
>>>>is over 2300-2400 it becomes very difficult to judge a candidate program
>>>>improvement. Personally I use a main set of 70-100 positions (frequently
>>>>updated) which are tested manually first then a large set of >500 positions
>>>>that runs automatically that produces a detailed report and database of
>>>>every difference in regard to the previous version. If results are good
>>>>then an engine-engine 300 game match is done as described above. In a
>>>>later stadium (after a couple of program changes) some auto232 matches
>>>>are played. The latter is of minor importance (in respect to the changes)
>>>>as too much randomness is involved (book, learning). In the end my feeling
>>>>on a program change is the decisive factor.
>>>
>>>
>>>Anyway this is a very time spending task.
>>
>>That's why most of us need a full year if you know what I mean.
>>
>>
>>>>>Playing 1000 games with tournament time control
>>>>>takes much too much time. Test positions don't reflect practical play.
>>>>>I really have no clue.
>>>>
>>>>>And that is why I always say thet the top-10 (!) programs
>>>>>play at equal strength.
>>>>
>>>>That's a bold statement.
>>>>
>>>>Ed
>>>
>>>I know. Prove me wrong. :-)
>>
>>How about a 10 game match....?
>
>What for? What a waste... Comp-comp won't prove a thing no matter how many games
>you play. Let's take a quick look:

I was only joking to Harald in the spirit of the topic.


>1 - Programs are helpless against anti-computer strategy, like Fritz in
>Frankfurt and Junior in Dortmund. Their performance is inversely proportional to
>human awareness of this shortcoming, and search alone won't solve the problem,
>or at least it won't solve it before we all become very bald. Oh yes: in
>comp-comp search is everything.
>
>2 - Programs are essentially polite social beings: they behave like GMs amongst
>GMs, like 2300s amongst 2300s. For instance, look at Junior's performance in
>Dortmund and in the Israeli league.

This is a substantial point and could be valid or not. Important here is
the data to proof or disproof this statement. One exception as mentioned
above is not enough. As far as I can see it the performance of comps
against lower rated players is quite stabile.

If valid it would for instance mean that the 2 x GK-DB matches should be
seen in this respect too and that Junior, Fritz, Rebel etc. may win a
6-game match against Kasparov too.

On the other hand if comps behave on 2300 against 2300 players programmers
better can pack one's bags or change course/tack from scratch. However I
do see too few evidence to support this claim as most of the time comps
have a positive score against 2300-2400 players.

Like to add the Judith Polgar example: it is said the rating of Judith
Polgar is 100-150 elo points too high because she is allowed to play in
the men's world top. If true the human rating system stinks. Fortunately
comp-comp is excused from that.

Based on the data available I (for now) have the following opinion:

#1. Humans have their own specific weaknesses: time control, making
tactical blunders, not winning a won position, overlooking small things,
nerves, pressure to win (or lose), being afraid for the tactical power of
the beast (Kasparov was full of it), going for an easy draw out of fear,
not being on their best each day in a long tournament. Humans are also
vulnerable for all kind of things that are happening in normal life that
could damage their concentration during a tournament (not feeling so well
up to dramatic happenings in their personal circumstances), the list is
endless.

#2. Computers have only a FEW weaknesses and CCC is full of it.

#3. The disadvantages as mentioned in (#1) are advantages for the computer
and IMO are often underestimated. Kasparov being totally confused after he
resigned in game-2 in a drawn position against DB combined with the fact he
in his mind started to question the integrity of the match. I am no
psychologist but it is quite well possible the match was over after game-2.
Imagine the opposite: in a comp-comp event you suspect your remote opponent
being a grandmaster. We have a example from the past and we know what it
did to the programmer in question when he was accused. Can you fully
concentrate on your next game in such cases? IMO Kasparov could not as the
poison in his mind was killing his creativity.

#4. IMO it is very important who you are playing. I for instance prefer to
play Karpov (even in his better days) over Piket, Seirawan or v/d Wiel as
Karpov with all respect is not such a good players against comps. For
instance Rebel got 2 easy draws against Karpov 3 years ago on a slow PC
and Karpov was very happy to accept a draw proposal with a few minutes left
on the clock Rebel being a pawn up. If you look at these games Karpov just
plays as Karpov which favors the computer, Rebel never was in trouble. I
even dare to mention Kasparov not able to give the computer the treatment
it deserves, the anti-computer strategy. IMO he is not able at least until
now. He tries (see the unorthodox openings after game-2) but he did not
manage. Note that Kasparov also lost a mini match against Genius some 4-5
years ago. Apparently playing comps is an art in itself or you need to be
gifted. I frankly believe who you are playing matters a lot.

Meaning to say that Man versus Machine is a whole different area and all
the pro's and contra's should be taken into account. Based on that I tend
to take the results as they come as only indicator as you can't compare
apples with oranges.

Of course I am in full agreement that the only way to make progress against
humans is to add tons of new chess knowledge needed to survive the anti-
computer strategy of humans who are able to play this successfully. Some
can, some don't.

But I also believe in search even playing humans. Search is also chess
knowledge although of a different kind.

Ed




>3 - If program A has extra code to avoid closed positions and program B does
>not, comp-comp won't show the difference as an advantage for A. If B is a faster
>searcher, the extra code will harm A when playing B.
>
>4 - Comp-comp games show a partial and rather uninteresting picture, their
>results don't necessarily correlate to human-comp and watching them can even
>become a threat to one's mental health.
>
>Now go figure the statistic certainty of 10, 100 or 1000 comp-comp games.
>
>Enrique
>
>>Ed



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.