Author: Andreas Herrmann
Date: 13:21:11 09/04/01
Go up one level in this thread
On September 04, 2001 at 13:49:07, Andreas Herrmann wrote: >On September 04, 2001 at 12:32:25, Ulrich Tuerke wrote: > >>On September 04, 2001 at 10:59:24, Uri Blass wrote: >> >>>On September 04, 2001 at 09:17:30, Ulrich Tuerke wrote: >>> >>>>On September 04, 2001 at 04:36:58, Jouni Uski wrote: >>>> >>>>>On September 04, 2001 at 04:13:47, Chris Taylor wrote: >>>>> >>>>>>On September 04, 2001 at 02:26:23, Jouni Uski wrote: >>>>>> >>>>>>>Bob has obviously abandoned 2 weeks version cycle, which is good!! I prefer >>>>>>>to get new version, when it's really better than old. E.g. in Junior's case >>>>>>>after 1,5 years! >>>>>>> >>>>>>>Jouni >>>>>> >>>>>>I have a feeling there will be more versions! >>>>>>If I had a large amount of money, and the betting shop was a tad nearer. I >>>>>>would, have a flutter. But then again, I am not a betting man? >>>>>> >>>>>>In auto232 matches I have 18.10 from bob's site doing rather well. It seems to >>>>>>be getteing stronger with each release. >>>>>> >>>>>I am not so sure. Look at this rating list from Herrmann: >>>>> >>>>>Crafty 17.14 2563 300 259 18 23 268.0 89.3 2207 >>>>>Crafty 18.10 2561 372 309 43 20 330.5 88.8 2212 >>>>>Crafty 17.12 2516 192 149 27 16 162.5 84.6 2225 >>>>>Crafty 18.07 2492 336 261 41 34 281.5 83.8 2211 >>>>> >>>>>Only sure thing is, that 18.7 was weak... >>>> >>>>Sure ??? Are you kidding ? Look at the error bars. >>>>These results are very well compatible with the statement that Crafty 18.07 is >>>>the strongest version. The statistics is far too poor to say anything. >>>>Uli >>> >>>No >>> >>>2563-23>2492+41 so it seems clear that 17.14 is better than 18.07 even in the >>>optimistic assumption for 18.07 and the passimistic assumption for 17.14 >> >>The numbers, which you are quoting, are not the error bars but the number of >>games with "=" and "-" result,( if I got this right from Andreas' page). >> >>I think that the error bars for 200 - 300 games (like above) are still of the >>order 60 - 70 ELO. It seems that Andreas has not given them. >>With this assumption I obtain >>2492 + 60 > 2563 - 60 >> > >Hi Uli, > >look to SSDF, the error bars for about 300 games is about +/-40 ELO and for 400 >games about +/-35 ELO. >But you are right, the max error of 18.07 (2492 +55 or so = 2547) is in the >lower part of the error window of 18.10 (2561 -40 = 2521). > >My rating program can't calculate the error bars like ELO-Stat but the ratings >over 85% and under 15% are more exact because my current version works not with >the "nearby formula" (Näherungsformel). > >E = ln( 1 / p-1 ) / ln(10) * -400 + a > >a = average elo opponents >p = points / number of games > >My very old version was working with this formula, but in this case you must >have values between 15 and 85% to got nearby exact ratings. > >My current program works with an internal ELO-table for the full percent values. >Between this full percent values i interpolate the results. To got exact values >i iterate until the max. difference between iteration n and n-1 is in a window >that i can configure. Normaly i take a 0,1 ELO window. For the current rating >list with over 26000 games i need 8 iterations to got inside the 0,1 ELO window. >In the other case my program can't calculate values under 1% and over 99%. > >Here is the right headline for the above list: > >engine rating games + = - pts % oppo >===================================================================== >Crafty 17.14 2563 300 259 18 23 268.0 89.3 2207 >Crafty 18.10 2561 372 309 43 20 330.5 88.8 2212 >Crafty 17.12 2516 192 149 27 16 162.5 84.6 2225 >Crafty 18.07 2492 336 261 41 34 281.5 83.8 2211 > >+ number of games won >= number of games draw >- number of games loose > >Andreas > >http://wbholmes.de Here an excample for the wrong ratings of EloStat, with results over 85% and under 15%. This ratings are made with EloStat Program Elo + - Games Score Av.Op. Draws 1 Gandalf432g : 2582 26 126 240 90.6 % 2180 8.8 % 2 Crafty1714 : 2565 24 117 300 89.3 % 2188 6.0 % 3 Crafty-18.10 : 2562 21 88 392 88.8 % 2195 11.2 % 4 Gandalf432f : 2556 27 107 248 87.3 % 2213 8.5 % 5 Gandalf432h : 2550 19 82 460 88.4 % 2190 10.7 % 6 Yace 0.99.50 : 2531 21 89 384 87.2 % 2190 8.3 % 7 WbNimzo2000b : 2530 17 69 584 86.7 % 2196 10.4 % ..... Look for excample to Gandalf432f D = 2582 - 2180 = 402 for 90,6% But right is 90% 366 ELO 91% 383 ELO 92% 401 ELO 93% 422 ELO The right value for D (elo difference) must be about 377 ELO and not 402. That's an error of 25 ELO. So the ratings that EloStat calculates for programs with a score higher than 85% are much to high, under 15% the ratings are much to low. And so the whole rating list is wrong, because the opponent ratings are also wrong. Thats the reason why i'm using my own rating program and not ELOStat. ELOStat can be only used, if all results between about 15 and 85%, because only in this area the calculated ratings are nearby right. Andreas
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.