Computer Chess Club Archives

Search

Terms

Messages

Subject: Re: Maybe a stupid experiment...

Author: Jeroen van Dorp

Date: 09:25:42 01/03/01

That's a very nice idea!! I like the challenge.
If I understand correctly, basically

*method*
1. you want to find out how many games you have to play with a chess program
before results become "statistically relevant"
2. you want to find this out by letting the program play itself until the black
and white results even out, or at least are "within acceptable margins".

I think however the problem about statistical significance focuses around
something else:

*what to determine*
1. how many games has an engine to play *against other engines* before you can
say something about *relative* strength, or…
2. how many games has an engine to play *against humans* before you can say
something about *relative* strength

*relative strengt in pool of players*
So both definitions only get value if you can say something about *relative*
strenght in the fixed pool of players. You won't get that result letting play
the engine against itself. If an engine plays against itself it will only tell
you when statisical flaws stop occuring because of random book opening choice,
flaws in the engine in certain situations not always happening , machine faults
etc.
So I don't think you solve the right problem with this solution.

*white first move advantage*
You might solve another problem: that of white's first move advantage. Looking
at stats of human players (collectively, not individually) you'll see a rough
division of 37% wins by white, 34% draws and 29% losses by white. That doesn't
translate necessarily to the single situation, but can be the visualisation of
the small effect of the white first move advantage.
Here we truely have a *pool*,  of two: white and black.

*fixed pool*
The only test you could perform to solving that statistical problem of chess
engine strenght is IMO by making a *fixed* pool of chess engines and letting
them play endlessly against each other.
When changes in relative strenght are fading out, or are becoming too low to be
significant (say your predictions/win-lose stats will become accurate in 99% of
all following games) you have the figure you're looking for – for *THAT* pool of
players/engines.

(In my opinion these results are roughly available. But don't let us discuss
about SSDF again :))


*example*
A common example of the strenght dynamics: You play 1000 games between engine
itself. Now stats are 38-33-29. They don't change significantly anymore.
So now you state 1000 games are needed for this/any  engine to calculate its
strength.

So you play a competition with a lot of engines/opponents of  1000 games After
these 1000 games one of the engine's rating is 2500.
Now there's a new engine/opponent. You play 1000 games against this new engine.
It's stronger. It wins all. The result is a drop op 400 rating points to 2100.
Now you return to the other pool without that new engine. You play 1000 games
again. Your start rating is 2100. After 1000 consecutive games it won a lot etc.
Because it started out at a weak 2100, its rating at the end will be 2650
because of "relatively" better performance….
Now the new engine is introduced in the pool. It starts without a rating. It
finds its nemesis. Its end rating is 2300, yet won all games against your
engine, rated 2500….


*parameter weight/influence*
So there's really no statistical flaw at all. What we lack is the *value* of the
 parameters and the *effects* on the strenght of a chess engine.
Hash tables, endgame tablebases (do they make stonger/weaker)  algorithm choice
etc. Why *do* results build up different from human strenght/rating buildup?
If you can pinpoint the effect of all those parameters on all aspects of the
game, and assign a value (weight) you don't need high game numbers anymore for
getting its strenght.
(It wouldn't surprise me if positional knowledge would came up with such an
analysis, but that's speculative)

I'm interested in your opinion.

Jeroen ;-}

Re: Maybe a stupid experiment... Les Fernandez 13:13:59 01/03/01
- Re: Maybe a stupid experiment... José Carlos 14:44:55 01/03/01
Re: Maybe a stupid experiment... José Carlos 10:36:22 01/03/01

This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.