Author: Uri Blass
Date: 09:55:37 09/04/01
Go up one level in this thread
On September 04, 2001 at 12:32:25, Ulrich Tuerke wrote: >On September 04, 2001 at 10:59:24, Uri Blass wrote: > >>On September 04, 2001 at 09:17:30, Ulrich Tuerke wrote: >> >>>On September 04, 2001 at 04:36:58, Jouni Uski wrote: >>> >>>>On September 04, 2001 at 04:13:47, Chris Taylor wrote: >>>> >>>>>On September 04, 2001 at 02:26:23, Jouni Uski wrote: >>>>> >>>>>>Bob has obviously abandoned 2 weeks version cycle, which is good!! I prefer >>>>>>to get new version, when it's really better than old. E.g. in Junior's case >>>>>>after 1,5 years! >>>>>> >>>>>>Jouni >>>>> >>>>>I have a feeling there will be more versions! >>>>>If I had a large amount of money, and the betting shop was a tad nearer. I >>>>>would, have a flutter. But then again, I am not a betting man? >>>>> >>>>>In auto232 matches I have 18.10 from bob's site doing rather well. It seems to >>>>>be getteing stronger with each release. >>>>> >>>>I am not so sure. Look at this rating list from Herrmann: >>>> >>>>Crafty 17.14 2563 300 259 18 23 268.0 89.3 2207 >>>>Crafty 18.10 2561 372 309 43 20 330.5 88.8 2212 >>>>Crafty 17.12 2516 192 149 27 16 162.5 84.6 2225 >>>>Crafty 18.07 2492 336 261 41 34 281.5 83.8 2211 >>>> >>>>Only sure thing is, that 18.7 was weak... >>> >>>Sure ??? Are you kidding ? Look at the error bars. >>>These results are very well compatible with the statement that Crafty 18.07 is >>>the strongest version. The statistics is far too poor to say anything. >>>Uli >> >>No >> >>2563-23>2492+41 so it seems clear that 17.14 is better than 18.07 even in the >>optimistic assumption for 18.07 and the passimistic assumption for 17.14 > >The numbers, which you are quoting, are not the error bars but the number of >games with "=" and "-" result,( if I got this right from Andreas' page). You are right here. > >I think that the error bars for 200 - 300 games (like above) are still of the >order 60 - 70 ELO. It seems that Andreas has not given them. >With this assumption I obtain >2492 + 60 > 2563 - 60 > >Okay ? The average number of games is 300 for every program. Based on the ssdf list Gambittiger has similiar number of games and the error bar of it is only 40 and 43 elo so I guess you need to do something like 2492+40>2563-40 but it is not exactly correct to do it. I have no time to think how to explain it now but it is more correct to calculate 40*(2^0.5)<60 to evaluate the error in the difference between 2 programs when 40 is the biggest error in one of them with 95% confidence and in this case we get significant difference. Uri
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.