Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: More Inferences from Chris Carson's Data

Author: José Antônio Fabiano Mendes

Date: 08:07:06 09/06/00

Go up one level in this thread


On September 05, 2000 at 16:04:10, Walter Koroljow wrote:

>
>
>This is a sequel to an analysis posted in July 2000
>(http://site2936.dellhost.com/forums/1/message.shtml?121825).  The approach here
>is simulation, not analysis, and the results are much stronger.
>
>***Data,Results, and an Overview of the Approach***
>*******************************************************
>
>The data is Chris Carson's human vs. computer data posted 7-16-00 (and available
>at http://home.interact.se/~w100107/welcome.htm).  I used all the data for
>single and multiple Pentium and Athlon processors at speeds of 200 or more MHz.
>Games against unrated players were not included.  Two games from the Dutch
>championship were not included since no more than 4 moves were made.  Games
>played by Zugzwang at Lippstadt in July 98 were included although I do not know
>what hardware was used. A list of the programs used is at the end of this post.

    Please see: [Zugzwang at Lippstadt]
   http://www.uni-paderborn.de/cs/chess/chesshome/zzc_t3e.html
>
>A total of 168 games was used and the programs scored 78 wins, 60 draws, and 30
>losses for 64.3%. The average human rating was 2419.4 FIDE.  A single program
>with a rating of 2542 would have achieved this result.
>
>In what follows, "Program" means a combination of hardware and software, e.g.,
>Hiarcs6 running on a Pmmx-200.
>
>Now suppose that program ratings are evenly distributed over a range of values
>with the number of programs going to zero smoothly at the edges of this range.
>(This will be described in more detail later).  Then we compute the 95%
>confidence intervals for the average rating of the programs as a function of the
>spread of values.  The results are:
>
>			95% Confidence Interval
>Spread of Ratings	for Average Rating
>-----------------	-----------------------
>	0			2504-2579
>	200			2504-2583
>	400			2504-2595.
>
>For example, suppose the spread of ratings is 200 points.  Suppose the mean is
>2550 (in the confidence interval 2504-2583) and is in the center of the spread.
>Then the ratings will range from 2450 to 2650.
>
>As will be shown later, confidence intervals depend on the number of draws
>produced.  The results above are based on 60 draws out of the 168 games as was
>observed.  To verify that this does not critically affect the results, the
>calculations were repeated for 50 draws (almost two standard deviations less
>than observed).  The results are:
>
>Sensitivity Analysis: 10 fewer draws than observed
>--------------------------------------------------
>			95% Confidence Interval
>Spread of Ratings	for Average Rating
>-----------------	-----------------------
>	0			2502-2581
>	200			2502-2586
>	400			2502-2597.
>
>Draws are clearly not a critical issue.
>
>The method of computation is to assume a mean and a spread and to generate
>random program ratings with this distribution.  Now simulate the 168 games and
>get a score.  We repeat this one million times (a Monte Carlo simulation) to get
>a probability distribution of the total program score for these 168 games.  If
>this distribution shows that the probability of the observed score (108 points)
>is less than 5%, we reject the assumed mean and spread since they predict such a
>low probability for what was observed.  By repeating the simulation for
>different means and spreads, we finally get values we can accept.  These are the
>values in the tables above.
>
>There are two major effects from increasing the spread.  One is that the
>confidence interval expands due to the increase in uncertainty introduced by the
>spread.  The other effect is that just increasing the spread (leaving the mean
>alone) lowers the probable score because the win expectancy curve is steeper in
>the region of the lower ratings.  This second effect only applies when the
>programs outrate the competition as they did here.  The result is that the two
>effects pretty much cancel each other for the left endpoint of the confidence
>intervals and combine to move the right endpoint up as the spread increases.
>
>The simulation (which contains the data used) takes about 6 pages of C.  I will
>provide it on request.
>
>The rest of this post goes into more detail.  It is divided into sections for
>easier reading.
>
>
>***Assumptions***:
>********************************************************
>
>1) The USCF win expectancy formula: We = 1/(1+10**((delta rating)/400)), where
>delta rating = (opponents rating) - (our rating).
>
>2) Of course, the probability of a draw is 0 for We = 0 or We = 1.  We assume
>that this probability varies linearly from these points to a maximum at We =
>0.5.  We set this maximum to get the right number of draws for all the 168
>games.
>
>3) All chess games are statistically independent events.
>
>
>***The Distribution of Program Ratings***
>*********************************************************
>
>Ratings for the programs were generated via a uniform distribution over the
>rating spread.  Then the whole distribution was shifted right or left to get the
>desired mean.  This means that the edges of the uniform distribution were
>smoothed and became "transition regions".  The size of these transition regions
>is on the order of one or two standard deviations of the mean of the ratings of
>the thirty programs.  This is about 5-10% of the spread of the ratings.  So a
>200 point spread would be bordered by two transition regions of about 10-20
>points each.
>
>
>**Simulation Verification**
>********************************************************
>
>First we note that the results for a spread of 0. can be derived analytically.
>In what follows, let:
>
>w=win probability,
>d=draw probability,
>We=win expectancy = w+d/2
>V1=variance of score for one game
>Sigma = sqrt(variance)
>S = expected score for all games
>
>We note that:
>
>V1 = w+d/2 - (w+d/2)**2 - d/4
>   = We -We**2 -d/4
>
>where w and d apply to a single game.  Then, since chess games are independent
>events, we can add these variances over all 168 games. The results are:
>
>Sigma= sqrt(Sum(We)-Sum(We**2)-D/4)
>S=Sum(We)
>
>where D is the expected number of draws in the N games.  Furthermore, since the
>total score is the sum of 168 independent events, we expect the total score to
>be distributed normally.  This means we can calculate percentiles analytically.
>All this can be evaluated via spreadsheet and compared to the results of the
>simulation.  Using 16 million simulation iterations the numbers for an assumed
>program rating of 2504 and zero spread are:
>
>					Confidence
>		    S	    Sigma	for score=108
>		--------    -----	-------------
>Analytical	100.662     4.370	95.4%
>
>Simulation	100.671	    4.385	95.9%.
>
>The numbers give considerable credence to the simulation results for zero
>spread.
>
>For the case of non-zero spread, we note that we do not expect much difference
>from the zero spread case.  The reason is that everything is based on the
>distribution of scores from a series of 168 games.  But the win expectancy
>formula is nearly linear over most of the range of rating differences we have to
>deal with.  That means that the scores will depend primarily on the mean program
>rating and will be only weakly dependent on the spreads.
>
>More specifically, the means of the randomized ratings were checked numerically
>and the ratings were checked by eye.  Finally, an independent program
>(basically, an integration over the spread) was written to evaluate the number
>of draws expected in the simulation.  This number closely matched the number
>produced by the simulation with non-zero spreads.
>
>
>**Data Used**
>**************************************************************
>
>The following is the brute force data input I used.  Each line corresponds to a
>program.  The first number in each line is the number of non-zero entries in the
>line.  The remaining numbers are the ratings of the human opponents for the
>program.
>
>int Human_Ratings[30][12]={
>/*Reb8 P200*/{3,2450,2485,2485,0,0,0,0,0,0,0,0},
>/*Hiarcs6 P200*/ 8,2485,2485,2485,2485,2485,2485,2500,2500,0,0,0,
>/*CM5K P2-300*/ 6,2035,2305,2235,2408,2330,2235,0,0,0,0,0,
>/*Hiarcs6 P2-266*/ 4,2125,2180,2330,2320,0,0,0,0,0,0,0,
>/*Reb9 P225*/ 8,2190,2100,2330,2340,2408,2205,2100,2320,0,0,0,
>/*Reb10 K62-450*/ 2,2795,2795,0,0,0,0,0,0,0,0,0,
>/*Zugzwang ?*/ 11,2470,2610,2470,2475,2550,2455,2540,2470,2410,2475,2390,
>/*CM6K P2-450*/ 7,2340,2325,2180,2225,2470,2130,2205,0,0,0,0,
>/*Hiarcs6 P2-350*/ 7,2340,2325,2180,2225,2470,2130,2205,0,0,0,0,
>/*Reb10 P2-400*/ 7,2340,2325,2180,2225,2470,2130,2205,0,0,0,0,
>/*Reb-cen K6-450*/ 1,2585,0,0,0,0,0,0,0,0,0,0,
>/*DJ6 4x-500*/ 1,2635,0,0,0,0,0,0,0,0,0,0,
>/*DJ6 Cel-450*/ 1,2100,0,0,0,0,0,0,0,0,0,0,
>/*DJ6 2x-450*/ 4,2443,2177,2271,2290,0,0,0,0,0,0,0,
>/*Ferret multi*/ 1,2630,0,0,0,0,0,0,0,0,0,0,
>/*Fritz6 multi*/ 1,2620,0,0,0,0,0,0,0,0,0,0,
>/*Shred4 P3-500*/ 8,2600,2443,2298,2566,2326,2100,2320,2109,0,0,0,
>/*Reb-cen K6-600*/ 8,2524,2593,2552,2486,2541,2515,2585,2501,0,0,0,
>/*Pconners multi*/ 11,2506,2550,2567,2478,2533,2399,2472,2301,2417,2467,2428,
>/*Reb-cen P2-350*/ 1,2399,0,0,0,0,0,0,0,0,0,0,
>/*Reb-cen K6-300*/ 3,2466,2450,2466,0,0,0,0,0,0,0,0,
>/*Reb-cen P3-500*/ 10,2432,2432,2320,2507,2177,2566,2445,2276,2271,2443,0,
>/*Reb-cen K6-400*/ 4,2398,2398,2398,2398,0,0,0,0,0,0,0,
>/*Fritz6 K7-500*/ 9, 2177,2515,2548,2298,2566,2276,2507,2100,2158,0,0,
>/*Gundar P500*/ 6,2370,2370,2295,2483,2325,2405,0,0,0,0,0,
>/*Fritz6 P3-500*/ 6,2405,2265,2285,2370,2483,2295,0,0,0,0,0,
>/*Reb-cen K7-1000*/ 1,2485,0,0,0,0,0,0,0,0,0,0,
>/*DJ6 8x-500*/ 9,2701,2755,2770,2615,2667,2660,2762,2649,2743,0,0,
>/*Pconners multi*/ 11,2612,2513,2545,2623,2606,2357,2357,2547,2539,2480,2523,
>/*Fritz6SSS multi*/ 9,2641,2537,2561,2540,2393,2641,2567,2498,2558,0,0};



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.