Author: Christophe Theron
Date: 16:00:24 12/15/03
Go up one level in this thread
On December 15, 2003 at 15:59:22, Andrew Dados wrote:
>On December 15, 2003 at 15:26:11, Andrew Dados wrote:
>
>>On December 15, 2003 at 12:39:26, Christophe Theron wrote:
>>
>>>On December 15, 2003 at 10:24:45, Andrew Dados wrote:
>>>
>>>>On December 15, 2003 at 01:25:40, Christophe Theron wrote:
>>>>
>>>>>On December 14, 2003 at 19:26:30, J F wrote:
>>>>>
>>>>>>Christophe, How many games do you recomend playing before you can draw a
>>>>>>conclusion?
>>>>>
>>>>>
>>>>>
>>>>>I think you are not going to like the answer. :)
>>>>>
>>>>>It depends on:
>>>>>* the reliability you want (do you want a 70% reliability? 80%? 90%? 95%?)
>>>>>* the elo difference between the programs
>>>>>
>>>>>If you want a very good reliability in the result (for example 95%) and the two
>>>>>programs are very close in elo, then you might need several thousands games.
>>>>>
>>>>>There is no simple answer to your question. However, I know that there exist a
>>>>>program called "whoisbetter" that can, given a match result, tell you if one
>>>>>program can be considered better than his opponent.
>>>>>
>>>>>The very important thing to remember is that in order to know which of the top
>>>>>PC chess programs is better, you will definitely need several thousands of
>>>>>games, believe it or not. So it's always funny to see somebody giving an opinion
>>>>>after 5 games.
>>>>>
>>>>>
>>>>>Below is a table that can be used to get an idea of the number of games to play
>>>>>to get a given error margin (in winning percentage and in elo difference) for a
>>>>>given reliability (percentage of confidence).
>>>>>
>>>>>The tables say that, for example, if you want to know with 90% reliability which
>>>>>opponent is better you will have to play 1000 games if their elo difference is
>>>>>15 points. If their elo difference is below 10 points, you will have to play
>>>>>more than 2000 games...
>>>>>
>>>>>Reliability of chess matches
>>>>>
>>>>>90% confidence
>>>>>Games %err+/- elo+/-
>>>>> 10 20 140pts
>>>>> 20 15 105pts
>>>>> 25 14 98pts
>>>>> 30 12 63pts
>>>>> 40 10 70pts
>>>>> 50 9 56pts
>>>>> 100 6.5 35pts
>>>>> 200 4.72 33pts
>>>>> 400 3.34 23pts
>>>>> 600 2.66 19pts
>>>>> 800 2.39 17pts
>>>>> 1000 2.12 15pts
>>>>> 1200 2.00 14pts
>>>>> 1400 1.81 13pts
>>>>> 1600 1.66 12pts
>>>>> 2000 ~1.50 11pts
>>>>>
>>>>>80% confidence
>>>>>Games %err+/- elo+/-
>>>>> 10 15 105pts
>>>>> 20 11 77pts
>>>>> 25 10 70pts
>>>>> 30 9 63pts
>>>>> 40 8 56pts
>>>>> 50 7 49pts
>>>>> 100 5.0 35pts
>>>>> 200 3.75 26pts
>>>>> 400 2.60 18pts
>>>>> 600 2.15 15pts
>>>>> 800 1.86 13pts
>>>>> 1000 1.66 12pts
>>>>> 1200 1.46 10pts
>>>>> 1400 1.40 10pts
>>>>> 1600 1.34 9pts
>>>>>
>>>>>70% confidence
>>>>>Games %err+/- elo+/-
>>>>> 10 15 105pts
>>>>> 20 10 70pts
>>>>> 25 8 56pts
>>>>> 30 8 56pts
>>>>> 40 6.3 44pts
>>>>> 50 6.0 42pts
>>>>> 100 4.0 28pts
>>>>> 200 3.0 21pts
>>>>> 400 2.2 15pts
>>>>> 600 1.7 12pts
>>>>> 800 1.5 11pts
>>>>> 1000 1.3 9pts
>>>>> 1200 1.24 9pts
>>>>> 1400 1.14 8pts
>>>>> 1600 1.04 7pts
>>>>>
>>>>>
>>>>>
>>>>> Christophe
>>>>
>>>>I always wondered how those tables are calculated. Since we have no model which
>>>>includes draw scores and draw possibilities in any satisfactory way all those
>>>>tables are just guessed (or most likely draw score possibilities are just
>>>>ignored).
>>>>
>>>>If draws and their chances are ignored, divide games column number by 2 is best
>>>>guess - each chess game has 3 outcomes, not 2, so every game equals to 2 coin
>>>>tosses not one (roughly, draw percent depends on opponent strength and this is
>>>>the problem here: we don't know what is expected percent of draw games).
>>>>
>>>>whoisbetter is one example of statistic ignoring one of 3 possible scores (it
>>>>comes to extreme), and thus produces incorrect probabilities.
>>>>
>>>>-Andrew-
>>>
>>>
>>>
>>>I'm not good enough at statistics to have produced these tables from a formula.
>>>
>>>I have built these tables empirically: with a program producing random outcome
>>>of chess games with the chances to win, draw or lose being equal. This is were
>>>my logic is biased, the chances to win for white seem to be higher than just
>>>1/3.
>>>
>>>The tables have been produced by generating a very high number of simulated
>>>matches and then crunching the numbers.
>>>
>>>I expect my results to be close to theorical results. I have published these
>>>tables several times and I have always asked for somebody to give me better
>>>estimates. I'm still waiting.
>>>
>>>
>>>
>>> Christophe
>>
>>Ok, now it makes more sense to me. Still same question remain:
>> If one program is better by 100 elo, what is chance of draw outcome in single
>>game? (and consequently what is w/d/l distribution) Simple model assumes this
>>should not depend on their average strength, yet in practice it makes big
>>difference (of course more draws as players strength increase). Also your note
>>about biased score towards white adds some complexity.
>>
>> Since we have no idea what is expected distribution of w/d/l (you assumed 1/3
>>each), we can't correctly predict win/lose chances. Could you some day possibly
>>rerun your simulation with different w/d/l distribution (but yielding same
>>rating difference)? I am curious how stable are the numbers in that table...
>>
>>My very simple simulation:
>>For program A better then B by 100 elo expected score is 0.69 . Lets play a 10
>>game match (100 000 times):
>>
>>a) assuming win chance of 0.59 and draw chance of 0.2:
>>A wins 89.5% matches, draws 5.0% and loses 5.4%
>>
>>b) assuming win chance of 0.49 and draw chance of 0.4 (so same expected score):
>>A wins 93.4% matches, draws 3.9% and loses 2.5%
>>
>>While I still have no idea what would be real chance of draw between those
>>programs, I can say it influences our expected score table (even error column)
>>greatly...
>
>(obvious decimal error corrected with % scores :)
>
>Note somewhat paradoxical result: the higher the chance of draw outcome in
>single game, the less chance that better player will lose (or draw) the match.
>
>...And since more draws happen between stronger players, confidence of 10-game
>match is higher towards the top. Maybe much higher.
>
>
>>
>>-Andrew-
I have changed the subject line, I hope you don't mind, because it sucked! :)
I'm not sure I can answer your question but I would really LOVE to see somebody
finally trying to answer it with a better approach than mine.
I have assumed that the probabilities and error bars would not change much if
w/d/l probability was taken as 40/30/30 for example (seems to be a little more
realistic) but maybe I'm dead wrong.
As usual, I had to be pragmatic. I needed to be able to evaluate the results of
my matches in order to say that a change was an improvement or not. So I had to
move forward and decided to use the above table with a possibly slightly wrong
assumption about 1/3-1/3-1/3.
As I told you I have posted the above table several times in the hope that
somebody would take the time to correct it.
Here is the ugly QBasic program I have used to build it, maybe it will help (I
think I have already posted it because it is in english, but it did not help to
find a volunteer last time):
CLS
PRINT "*** Simulation of chess matches ***"
PRINT
PRINT "We assume that opponents are exactly of equal strength."
PRINT "And that (win;draw;loss) probablities are (1/3; 1/3; 1/3)."
PRINT
RANDOMIZE TIMER
DIM nbmatch AS INTEGER
DIM nbgames AS INTEGER
DIM limit AS SINGLE
INPUT "Number of matches to play "; nbmatch
INPUT "Number of games in each match "; nbgames
INPUT "Compute probability of error greater than "; limit
DIM totdiff AS SINGLE
totdiff = 0
DIM maxdiff AS SINGLE
maxdiff = 0
DIM nbmax AS INTEGER
nbmax = 0
DIM nboverlimit AS INTEGER
nboverlimit = 0
DIM total AS SINGLE
DIM percent AS SINGLE
DIM diff AS SINGLE
DIM result AS INTEGER
DIM game AS INTEGER
FOR match = 1 TO nbmatch
total = 0 ' total is the score of player A
FOR game = 1 TO nbgames
result = INT(RND * 300) ' result between 0 and 299 included
IF result < 100 THEN
' DRAW
'PRINT "1/2 - 1/2"
total = total + .5
ELSE
IF result < 200 THEN
' PLAYER A LOSES: total stays unchanged
'PRINT " 0 - 1"
ELSE
' PLAYER A WINS
'PRINT " 1 - 0"
total = total + 1
END IF
END IF
NEXT game
percent = total / nbgames * 100!
diff = ABS(percent - 50)
PRINT match;
LOCATE CSRLIN, 1
'PRINT "Outcome of the match: "; total; "-"; nbgames - total; " (";
'PRINT USING "###.##"; percent;
'PRINT "%, error=";
'PRINT USING "##.##"; diff;
'PRINT " )"
IF diff > maxdiff THEN
maxdiff = diff
nbmax = 0
END IF
IF diff = maxdiff THEN nbmax = nbmax + 1
totdiff = totdiff + diff
IF diff > limit THEN nboverlimit = nboverlimit + 1
NEXT match
PRINT " "
PRINT "Maximum error: ";
PRINT USING "##.##"; maxdiff;
PRINT " (elo diff = ";
PRINT USING "###"; maxdiff * 7!;
PRINT ") occured in ";
PRINT USING "###.###"; nbmax / nbmatch;
PRINT "% of the matches"
PRINT "Average error: ";
PRINT USING "##.##"; totdiff / nbmatch;
PRINT " (elo diff = ";
PRINT USING "###"; totdiff / nbmatch * 7!;
PRINT ")"
PRINT "Prob( error > ";
PRINT USING "##.##"; limit;
PRINT "% ) = ";
PRINT "Prob( elo diff > ";
PRINT USING "###"; limit * 7!;
PRINT " ) = ";
PRINT USING "##.##"; nboverlimit / nbmatch * 100!;
PRINT "%"
Good luck.
Christophe
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.