Author: blass uri
Date: 17:08:02 03/20/00
Go up one level in this thread
On March 20, 2000 at 19:04:47, Christophe Theron wrote: >On March 20, 2000 at 17:28:37, blass uri wrote: > >>On March 20, 2000 at 13:56:59, Christophe Theron wrote: >> >>>On March 20, 2000 at 06:46:37, Bertil Eklund wrote: >>> >>>>On March 19, 2000 at 22:26:49, Christophe Theron wrote: >>>> >>>>>On March 19, 2000 at 15:41:30, Bertil Eklund wrote: >>>>> >>>>>> >>>>>>Hi! >>>>>> >>>>>>A very impressive result from Shredder4. >>>>>> >>>>>>IMO Shredder plays positionally very good and excellent in the endgames. >>>>>>Nimzo is a bit stronger tactically. >>>>>> >>>>>>Shredder4 used all 4 Turbo-CDs. >>>>>> >>>>>>Bertil >>>>> >>>>> >>>>>Bertil, >>>>> >>>>>I am not sure this message is going to be well accepted. So let me first state >>>>>that I have the greatest respect for your work and the SSDF. >>>>> >>>>>Let me also state that I have a lot of respect for Nimzo, Shredder, and their >>>>>respective authors. >>>>Yes it's all great programs. >>>> >>>> >>>>>However, I can only strongly disagree with your sentence "a very impressive >>>>>result from Shredder4". >>>> >>>>57,5% against a program known as one of the best on tournament time-control >>>>impressed at least me. I only talk about this 40 game match. Maybe it loses to >>>>Tiger in the next match but it's another match. >>> >>> >>>Maybe Tiger loses, actually I do not know. >>> >>>But 57.5% must be taken with a statistical grain of salt. From the statistical >>>data I have, and I'm open to discussion about this, on a 40 games match you can >>>expect the error margin to be +/- 8.0% if you want 80% confidence. >> >>1)If you assume probability of 50% for win and of 50% for loss between equal >>players and assume that colours of the players are not relevant the standard >>error is >>sqrt(0.5*0.5*40)=sqrt(10)>3.1 points and in this case 3 is almost the standard >>error >> >>3.2/40=8% so in this case the error margin is really +/- 8.0% > > >I was assuming 1/3 wins, 1/3 draws, 1/3 losses. If this is the case then the error margin is sqrt(2/3*10)<2.6 points 2.6/40=6.5% but this assumption is not logical because it does not consider the fact that white has better chances. > > > > >>2)If you assume probability of 20% for win and of 20% of loss and 60% for a draw >>between equal players(colours are not relevant) the standard error is: >>sqrt(0.4*0.5*0.5*40)=sqrt(4)=2 >> >>when 0.4*0.4*0.5 is the variation in on game >>0.4*0.5*0.5*40 is the variation in 40 games >>and I do square root of it to calculate the standard error. >> >>In this case the standard error is only 5%. >>I think this assumption assumes more draws then there are between computers. >> >>3)If you assume 40% for white 30% for a draw 30% for black between equal players >>then the variance in one game is >>0.4*0.45*0.45+0.3*0.05*0.05+0.3*0.55*0.55=0.4*0.2025+0.3*0.0025+0.3*0.3025= >>0.1725 >> >>In this case the variance in 40 games is 0.1725*40=7.1 and the standard >>deviation is sqrt(7.1)<2.7 >> >>2.7/40=6.75% and the standard deviation is +-6.7% > > >Isn't it closer to 6.8? No because 2.7/40 is slightly bigger then the standard deviation. I was wrong in my calculation(I used my head and not a computer and I see now that 0.1725*40=6.9 and not 7.1) I have even sqrt(6.9)<2.65 and 2.65/40=6.625% so the standard deviation is +-6.6% > > > >>The last case seems to be something close to the realistic case in games between >>equal programs(I believe that there are more draws between equal programs and >>this reduce the standard deviation but I am not sure) > > >I have no evidence that the rate of draws is higher between equal programs. >Maybe it's possible to make a study from the database of SSDF games? I did not try to do it but I believe that it is the case. If we take extreme case then the better program has 100% and there are no draws. >>The probability for a draw is also dependent on the style of the programs. > > >Style is not part of my maths. I'm just a bean counter. :) > > > > Christophe I understand. The problem is complicated enough even without considering the style so we can ignore small errors because of not considering the style of the program because we need a lot of games to see if there is significant difference in the number of draws between programs and we probably have not enough games for it. Uri
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.