Author: Christophe Theron
Date: 16:00:24 12/15/03
Go up one level in this thread
On December 15, 2003 at 15:59:22, Andrew Dados wrote: >On December 15, 2003 at 15:26:11, Andrew Dados wrote: > >>On December 15, 2003 at 12:39:26, Christophe Theron wrote: >> >>>On December 15, 2003 at 10:24:45, Andrew Dados wrote: >>> >>>>On December 15, 2003 at 01:25:40, Christophe Theron wrote: >>>> >>>>>On December 14, 2003 at 19:26:30, J F wrote: >>>>> >>>>>>Christophe, How many games do you recomend playing before you can draw a >>>>>>conclusion? >>>>> >>>>> >>>>> >>>>>I think you are not going to like the answer. :) >>>>> >>>>>It depends on: >>>>>* the reliability you want (do you want a 70% reliability? 80%? 90%? 95%?) >>>>>* the elo difference between the programs >>>>> >>>>>If you want a very good reliability in the result (for example 95%) and the two >>>>>programs are very close in elo, then you might need several thousands games. >>>>> >>>>>There is no simple answer to your question. However, I know that there exist a >>>>>program called "whoisbetter" that can, given a match result, tell you if one >>>>>program can be considered better than his opponent. >>>>> >>>>>The very important thing to remember is that in order to know which of the top >>>>>PC chess programs is better, you will definitely need several thousands of >>>>>games, believe it or not. So it's always funny to see somebody giving an opinion >>>>>after 5 games. >>>>> >>>>> >>>>>Below is a table that can be used to get an idea of the number of games to play >>>>>to get a given error margin (in winning percentage and in elo difference) for a >>>>>given reliability (percentage of confidence). >>>>> >>>>>The tables say that, for example, if you want to know with 90% reliability which >>>>>opponent is better you will have to play 1000 games if their elo difference is >>>>>15 points. If their elo difference is below 10 points, you will have to play >>>>>more than 2000 games... >>>>> >>>>>Reliability of chess matches >>>>> >>>>>90% confidence >>>>>Games %err+/- elo+/- >>>>> 10 20 140pts >>>>> 20 15 105pts >>>>> 25 14 98pts >>>>> 30 12 63pts >>>>> 40 10 70pts >>>>> 50 9 56pts >>>>> 100 6.5 35pts >>>>> 200 4.72 33pts >>>>> 400 3.34 23pts >>>>> 600 2.66 19pts >>>>> 800 2.39 17pts >>>>> 1000 2.12 15pts >>>>> 1200 2.00 14pts >>>>> 1400 1.81 13pts >>>>> 1600 1.66 12pts >>>>> 2000 ~1.50 11pts >>>>> >>>>>80% confidence >>>>>Games %err+/- elo+/- >>>>> 10 15 105pts >>>>> 20 11 77pts >>>>> 25 10 70pts >>>>> 30 9 63pts >>>>> 40 8 56pts >>>>> 50 7 49pts >>>>> 100 5.0 35pts >>>>> 200 3.75 26pts >>>>> 400 2.60 18pts >>>>> 600 2.15 15pts >>>>> 800 1.86 13pts >>>>> 1000 1.66 12pts >>>>> 1200 1.46 10pts >>>>> 1400 1.40 10pts >>>>> 1600 1.34 9pts >>>>> >>>>>70% confidence >>>>>Games %err+/- elo+/- >>>>> 10 15 105pts >>>>> 20 10 70pts >>>>> 25 8 56pts >>>>> 30 8 56pts >>>>> 40 6.3 44pts >>>>> 50 6.0 42pts >>>>> 100 4.0 28pts >>>>> 200 3.0 21pts >>>>> 400 2.2 15pts >>>>> 600 1.7 12pts >>>>> 800 1.5 11pts >>>>> 1000 1.3 9pts >>>>> 1200 1.24 9pts >>>>> 1400 1.14 8pts >>>>> 1600 1.04 7pts >>>>> >>>>> >>>>> >>>>> Christophe >>>> >>>>I always wondered how those tables are calculated. Since we have no model which >>>>includes draw scores and draw possibilities in any satisfactory way all those >>>>tables are just guessed (or most likely draw score possibilities are just >>>>ignored). >>>> >>>>If draws and their chances are ignored, divide games column number by 2 is best >>>>guess - each chess game has 3 outcomes, not 2, so every game equals to 2 coin >>>>tosses not one (roughly, draw percent depends on opponent strength and this is >>>>the problem here: we don't know what is expected percent of draw games). >>>> >>>>whoisbetter is one example of statistic ignoring one of 3 possible scores (it >>>>comes to extreme), and thus produces incorrect probabilities. >>>> >>>>-Andrew- >>> >>> >>> >>>I'm not good enough at statistics to have produced these tables from a formula. >>> >>>I have built these tables empirically: with a program producing random outcome >>>of chess games with the chances to win, draw or lose being equal. This is were >>>my logic is biased, the chances to win for white seem to be higher than just >>>1/3. >>> >>>The tables have been produced by generating a very high number of simulated >>>matches and then crunching the numbers. >>> >>>I expect my results to be close to theorical results. I have published these >>>tables several times and I have always asked for somebody to give me better >>>estimates. I'm still waiting. >>> >>> >>> >>> Christophe >> >>Ok, now it makes more sense to me. Still same question remain: >> If one program is better by 100 elo, what is chance of draw outcome in single >>game? (and consequently what is w/d/l distribution) Simple model assumes this >>should not depend on their average strength, yet in practice it makes big >>difference (of course more draws as players strength increase). Also your note >>about biased score towards white adds some complexity. >> >> Since we have no idea what is expected distribution of w/d/l (you assumed 1/3 >>each), we can't correctly predict win/lose chances. Could you some day possibly >>rerun your simulation with different w/d/l distribution (but yielding same >>rating difference)? I am curious how stable are the numbers in that table... >> >>My very simple simulation: >>For program A better then B by 100 elo expected score is 0.69 . Lets play a 10 >>game match (100 000 times): >> >>a) assuming win chance of 0.59 and draw chance of 0.2: >>A wins 89.5% matches, draws 5.0% and loses 5.4% >> >>b) assuming win chance of 0.49 and draw chance of 0.4 (so same expected score): >>A wins 93.4% matches, draws 3.9% and loses 2.5% >> >>While I still have no idea what would be real chance of draw between those >>programs, I can say it influences our expected score table (even error column) >>greatly... > >(obvious decimal error corrected with % scores :) > >Note somewhat paradoxical result: the higher the chance of draw outcome in >single game, the less chance that better player will lose (or draw) the match. > >...And since more draws happen between stronger players, confidence of 10-game >match is higher towards the top. Maybe much higher. > > >> >>-Andrew- I have changed the subject line, I hope you don't mind, because it sucked! :) I'm not sure I can answer your question but I would really LOVE to see somebody finally trying to answer it with a better approach than mine. I have assumed that the probabilities and error bars would not change much if w/d/l probability was taken as 40/30/30 for example (seems to be a little more realistic) but maybe I'm dead wrong. As usual, I had to be pragmatic. I needed to be able to evaluate the results of my matches in order to say that a change was an improvement or not. So I had to move forward and decided to use the above table with a possibly slightly wrong assumption about 1/3-1/3-1/3. As I told you I have posted the above table several times in the hope that somebody would take the time to correct it. Here is the ugly QBasic program I have used to build it, maybe it will help (I think I have already posted it because it is in english, but it did not help to find a volunteer last time): CLS PRINT "*** Simulation of chess matches ***" PRINT PRINT "We assume that opponents are exactly of equal strength." PRINT "And that (win;draw;loss) probablities are (1/3; 1/3; 1/3)." PRINT RANDOMIZE TIMER DIM nbmatch AS INTEGER DIM nbgames AS INTEGER DIM limit AS SINGLE INPUT "Number of matches to play "; nbmatch INPUT "Number of games in each match "; nbgames INPUT "Compute probability of error greater than "; limit DIM totdiff AS SINGLE totdiff = 0 DIM maxdiff AS SINGLE maxdiff = 0 DIM nbmax AS INTEGER nbmax = 0 DIM nboverlimit AS INTEGER nboverlimit = 0 DIM total AS SINGLE DIM percent AS SINGLE DIM diff AS SINGLE DIM result AS INTEGER DIM game AS INTEGER FOR match = 1 TO nbmatch total = 0 ' total is the score of player A FOR game = 1 TO nbgames result = INT(RND * 300) ' result between 0 and 299 included IF result < 100 THEN ' DRAW 'PRINT "1/2 - 1/2" total = total + .5 ELSE IF result < 200 THEN ' PLAYER A LOSES: total stays unchanged 'PRINT " 0 - 1" ELSE ' PLAYER A WINS 'PRINT " 1 - 0" total = total + 1 END IF END IF NEXT game percent = total / nbgames * 100! diff = ABS(percent - 50) PRINT match; LOCATE CSRLIN, 1 'PRINT "Outcome of the match: "; total; "-"; nbgames - total; " ("; 'PRINT USING "###.##"; percent; 'PRINT "%, error="; 'PRINT USING "##.##"; diff; 'PRINT " )" IF diff > maxdiff THEN maxdiff = diff nbmax = 0 END IF IF diff = maxdiff THEN nbmax = nbmax + 1 totdiff = totdiff + diff IF diff > limit THEN nboverlimit = nboverlimit + 1 NEXT match PRINT " " PRINT "Maximum error: "; PRINT USING "##.##"; maxdiff; PRINT " (elo diff = "; PRINT USING "###"; maxdiff * 7!; PRINT ") occured in "; PRINT USING "###.###"; nbmax / nbmatch; PRINT "% of the matches" PRINT "Average error: "; PRINT USING "##.##"; totdiff / nbmatch; PRINT " (elo diff = "; PRINT USING "###"; totdiff / nbmatch * 7!; PRINT ")" PRINT "Prob( error > "; PRINT USING "##.##"; limit; PRINT "% ) = "; PRINT "Prob( elo diff > "; PRINT USING "###"; limit * 7!; PRINT " ) = "; PRINT USING "##.##"; nboverlimit / nbmatch * 100!; PRINT "%" Good luck. Christophe
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.