Author: Stephen A. Boak
Date: 01:08:20 01/29/00
Go up one level in this thread
On January 27, 2000 at 22:55:51, Michael Neish wrote:
>
>Hi,
>
>Before I get flamed, by "dummy" I mean fake. I'm not calling anyone stupid. :)
>
>Anyone could do this in a few minutes. I ran a Cadaques-style tournament
>between seven fictitious computer programs, i.e., seven programs play each other
>over matches consisting of 20 games each, 420 games in total for the whole
>tournament.
>
>I made the following assumptions:
>
>1) The computers are all of equal strength.
>
>2) The probability of a win, draw or loss are one-third each.
>
>These assumptions are for simplicity's sake only. If anyone can suggest better
>win/draw/loss probabilities please let me know, although for the sake of this
>post I don't think they make much difference.
>
>Okay, onto the results. To get the full flavour of things one would have to run
>many thousands of "Cadaques" tournaments, and look at the gross results. But
>here I will reproduce only the first ten tournament results that popped out of
>my program. The programs are all of equal strength, remember. Apologies if
>your monitor doesn't line up the figures very well.
>
>Tournament 1
>
>HiSparks 68.5
>Grits 6a 63
>Terebul Mouse 61
>Petunia 6 60.5
>Terebul Century 57
>Toddler 4 56
>Bimbo 7.32 54
>
>winner's score - loser's score = 14.5
>winner's score - runner-up's score = 5.5
>
>Tournament 2
>
>Terebul Mouse 68.5
>Toddler 4 64.5
>Petunia 6 63
>Bimbo 7.32 61
>HiSparks 56
>Terebul Century 56
>Grits 6a 51
>
>winner's score - loser's score = 17.5
>winner's score - runner-up's score = 4
>
>Tournament 3
>
>HiSparks 72
>Terebul Mouse 67
>Grits 6a 62
>Petunia 6 56
>Bimbo 7.32 55
>Toddler 4 54
>Terebul Century 54
>
>winner's score - loser's score = 18
>winner's score - runner-up's score = 5
>
>Tournament 4
>
>Terebul Century 65.5
>Grits 6a 62.5
>Bimbo 7.32 61
>Terebul Mouse 59.5
>Toddler 4 59
>Petunia 6 57.5
>HiSparks 55
>
>winner's score - loser's score = 10.5
>winner's score - runner-up's score = 3
>
>Tournament 5
>
>Terebul Mouse 64
>Grits 6a 63.5
>Bimbo 7.32 61
>Toddler 4 61
>HiSparks 57.5
>Petunia 6 57
>Terebul Century 56
>
>winner's score - loser's score = 8
>winner's score - runner-up's score = 0.5
>
>Tournament 6
>
>Bimbo 7.32 63
>Terebul Century 62.5
>Terebul Mouse 62
>Toddler 4 61.5
>Petunia 6 60
>Grits 6a 57
>HiSparks 54
>
>winner's score - loser's score = 9
>winner's score - runner-up's score = 0.5
>
>Tournament 7
>
>Bimbo 7.32 69
>Grits 6a 64
>Toddler 4 62.5
>Terebul Century 60.5
>Petunia 6 58
>HiSparks 57.5
>Terebul Mouse 48.5
>
>winner's score - loser's score = 20.5
>winner's score - runner-up's score = 5
>
>Tournament 8
>
>HiSparks 64.5
>Toddler 4 64
>Terebul Century 61
>Petunia 6 59.5
>Terebul Mouse 59.5
>Grits 6a 58.5
>Bimbo 7.32 53
>
>winner's score - loser's score = 11.5
>winner's score - runner-up's score = 0.5
>
>Tournament 9
>
>HiSparks 63 6 0
>Bimbo 7.32 63
>Grits 6a 60.5
>Petunia 6 59.5
>Terebul Mouse 59.5
>Toddler 4 57.5
>Terebul Century 57
>
>winner's score - loser's score = 6
>winner's score - runner-up's score = 0
>
>Tournament 10
>
>Terebul Mouse 69
>Bimbo 7.32 61
>HiSparks 60.5
>Terebul Century 59.5
>Petunia 6 58
>Toddler 4 57
>Grits 6a 55
>
>winner's score - loser's score = 14
>winner's score - runner-up's score = 8
>
>--------------------------------------------------
>
>If you've got this far in the message, what does this prove? Well, I'm not
>sure! These are only ten simulations. But it does show that a spread is
>expected on statistical grounds alone. In the case of Tourney 10, there is an
>8-point difference between the first and second program. In Tourney 7 there is
>a 20.5-point gap between the top and bottom, and also a 9-point gap between the
>last place and the next-to-last place. I wonder how many football managers
>would be pressurised into resigning for such a pitiful score in Tourney 7. Poor
>man -- his team is just as good as the others.
>
>But on average:
>
>First - Second program = 3.2 points
>First - Last program = 12.95 points
>Winning score = 66.7 points (= 55.6% score)
>
>Again, these are very few simulations. I didn't look at the scores for each
>individual match, but I'm sure there is an even greater variation within
>individual matches, which are then evened out a little by the fact that some
>programs will compensate for bad performances in one match in another match. If
>anyone is interested I will give the actual breakdown of the results for these
>same ten tournaments.
>
>It will be interesting to compare these results with the real Cadaques results
>once the tournament is over, although it seems that there will be a larger gap
>between the programs there. But of course, they are not of equal strength and
>I've read that there are also some problems when the Rebel programs are run on
>Autoplay.
>
>I hope this was interesting. It's not easy to see who is best, even in a
>420-game tournament.
>
>Cheers,
>
>Mike.
I like math/statistics as well as computer simulations--very useful learning
tools! When you measure something (especially by careful, controlled and
purposeful testing, such as by computer/math simulations), using exising tools,
and you subsequently review and think things over very carefully, testing your
premises and conclusions, you may find out there are some limits on your
reasoning as well as on of tools. If you push either beyond its proper limits,
you are in the realm of bad logic, unfounded speculation, or simply often just
plain wrong.
Using your simulated results of *equal* programs in ten tournaments (columns
described from right to left), I tallied the Total Wins for each copy, the % of
Wins, the Ave # Wins / Tournament, and then used Excel to linearly predict the
score in the next (11th) Tournament.
One can see that the AVE PTS per Tournament is rather close for each copy after
600 games each, as is the Scoring % (Total Pts / Tot Possible Pts). This
indicates that the players are rather similar in rating, although the results in
individual tournaments may vary widely even for the same (and equal strength!)
player.
Another interesting factor to be careful with, that can lead to false
conclusions--is when time sequence data happens to generally rise, or generally
fall.
When the time element of results is considered, the linear forecast for any
player's expected result in the next time period (here, Tournament 11) is highly
affected by the relative comparison of several early results (say, Tournaments
1-4) with several late results (say, Tournaments 7-10), even though the overall
average results across all 10 Tournaments is about the same.
Thus *trend* conclusions (especially) may be dead wrong (who is getting
stronger, who is getting weaker), when looking at too few games. A false sense
of improvement or decline may not be supported in reality--due to natural
variation at work (normal statistical variation), even among equal strength
programs.
Once again we can see that fast and simplistic or intuitive analysis may lead to
mathematically improper conclusions--because most people don't understand or
intuit *natural variation*.
I recommend reading any typical college text on statistics, such as 'Statistics
for Business' or 'Statistics for Non-Science Majors', etc. If you have ability
to understand the calculations and the reasoning and work your way through it,
you will more easily understand the strength and weaknesses of claims based on
the available evidence--and you will be able to calculate or program your own
figures instead of be dependent on others. You can borrow such books free at
your local library.
FCST
Program 11 AVE Score% TOT
H 59.0 60.9 14.49% 608.5
G 58.9 59.7 14.21% 597
TM 59.7 61.9 14.73% 618.5
P6 58.1 58.9 14.02% 589
TC 60.8 58.9 14.02% 589
T4 60.4 59.7 14.21% 597
B 63.2 60.1 14.31% 601
1st-Last 5.1 1.0 0.70%
1st-2nd 2.4 3.0 0.24%
--Steve
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.