Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Is Crafty 18.10 the final version?

Author: Andreas Herrmann

Date: 13:53:28 09/04/01

Go up one level in this thread


On September 04, 2001 at 16:05:00, Ulrich Tuerke wrote:

>On September 04, 2001 at 13:49:07, Andreas Herrmann wrote:
>
>>On September 04, 2001 at 12:32:25, Ulrich Tuerke wrote:
>>
>>>On September 04, 2001 at 10:59:24, Uri Blass wrote:
>>>
>>>>On September 04, 2001 at 09:17:30, Ulrich Tuerke wrote:
>>>>
>>>>>On September 04, 2001 at 04:36:58, Jouni Uski wrote:
>>>>>
>>>>>>On September 04, 2001 at 04:13:47, Chris Taylor wrote:
>>>>>>
>>>>>>>On September 04, 2001 at 02:26:23, Jouni Uski wrote:
>>>>>>>
>>>>>>>>Bob has obviously abandoned 2 weeks version cycle, which is good!! I prefer
>>>>>>>>to get new version, when it's really better than old. E.g. in Junior's case
>>>>>>>>after 1,5 years!
>>>>>>>>
>>>>>>>>Jouni
>>>>>>>
>>>>>>>I have a feeling there will be more versions!
>>>>>>>If I had a large amount of money, and the betting shop was a tad nearer.  I
>>>>>>>would, have a flutter.  But then again, I am not a betting man?
>>>>>>>
>>>>>>>In auto232 matches I have 18.10 from bob's site doing rather well.  It seems to
>>>>>>>be getteing stronger with each release.
>>>>>>>
>>>>>>I am not so sure. Look at this rating list from Herrmann:
>>>>>>
>>>>>>Crafty 17.14              2563    300  259  18  23  268.0  89.3  2207
>>>>>>Crafty 18.10              2561    372  309  43  20  330.5  88.8  2212
>>>>>>Crafty 17.12              2516    192  149  27  16  162.5  84.6  2225
>>>>>>Crafty 18.07              2492    336  261  41  34  281.5  83.8  2211
>>>>>>
>>>>>>Only sure thing is, that 18.7 was weak...
>>>>>
>>>>>Sure ??? Are you kidding ? Look at the error bars.
>>>>>These results are very well compatible with the statement that Crafty 18.07 is
>>>>>the strongest version. The statistics is far too poor to say anything.
>>>>>Uli
>>>>
>>>>No
>>>>
>>>>2563-23>2492+41 so it seems clear that 17.14 is better than 18.07 even in the
>>>>optimistic assumption for 18.07 and the passimistic assumption for 17.14
>>>
>>>The numbers, which you are quoting, are not the error bars but the number of
>>>games with "=" and "-" result,( if I got this right from Andreas' page).
>>>
>>>I think that the error bars for 200 - 300 games (like above) are still of the
>>>order 60 - 70 ELO. It seems that Andreas has not given them.
>>>With this assumption I obtain
>>>2492 + 60  >  2563 - 60
>>>
>>
>>Hi Uli,
>>
>>look to SSDF, the error bars for about 300 games is about +/-40 ELO and for 400
>>games about +/-35 ELO.
>>But you are right, the max error of 18.07 (2492 +55 or so = 2547) is in the
>>lower part of the error window of 18.10 (2561 -40 = 2521).
>>
>>My rating program can't calculate the error bars like ELO-Stat but the ratings
>>over 85% and under 15% are more exact because my current version works not with
>>the "nearby formula" (Näherungsformel).
>>
>>E = ln( 1 / p-1 ) / ln(10) * -400 + a
>>
>>a = average elo opponents
>>p = points / number of games
>>
>>My very old version was working with this formula, but in this case you must
>>have values between 15 and 85% to got nearby exact ratings.
>>
>>My current program works with an internal ELO-table for the full percent values.
>>Between this full percent values i interpolate the results. To got exact values
>>i iterate until the max. difference between iteration n and n-1 is in a window
>>that i can configure. Normaly i take a 0,1 ELO window. For the current rating
>>list with over 26000 games i need 8 iterations to got inside the 0,1 ELO window.
>>In the other case my program can't calculate values under 1% and over 99%.
>>
>>Here is the right headline for the above list:
>>
>>engine                    rating  games  +   =   -    pts     %  oppo
>>=====================================================================
>>Crafty 17.14              2563    300  259  18  23  268.0  89.3  2207
>>Crafty 18.10              2561    372  309  43  20  330.5  88.8  2212
>>Crafty 17.12              2516    192  149  27  16  162.5  84.6  2225
>>Crafty 18.07              2492    336  261  41  34  281.5  83.8  2211
>>
>>+  number of games won
>>=  number of games draw
>>-  number of games loose
>>
>>Andreas
>
>Hi Andreas,
>
>thx for clarifying this.
>I agree to your error estimate.
>I'd conclude that it seems rather likely that the 17.14 is stronger than the
>18.07 . But I wouldn't claim that it's for sure.
>IMO, the whole procedure of ELO determination by collecting games has also some
>kind of inherent, systematical error, which is not reflected by the standard
>error bars. I mean, sometimes the ChessBase beta testers have tested 2 Comet
>versions, which had been almost the same. But nevertheless, they came to quite
>different ELO-estimates, though relying on large numbers of games. I still
>couldn't believe because I knew the test objects too well.
>I think that the choice of opponents must have been very different. This has
>also significant influence. The results are only comparable if the set of
>opponents is the same.
>

Yes, thats all right. To get good ratings you must have many games with many
different opponents (Anzahl Stichproben) and the best is, to have an average
opponent rating that is in the area of +/-200 ELO or better.
I wrote for about 7 years statistic software for quality secure
(Qualitätssicherung) and i needed some years to understand the results of
statistic. It's easy to use a statistic formula to calculate something, but to
understand the result isn't easy.

My ratings are only good for the programs in the middle of the list, because the
opponent rating is about the same than the program elo.
The SSDF ratings are also not so good, because they play sometimes only against
about 10 or less opponents, i think thats not enough to calculate good ratings.
The number of random search (Sichproben) = number of opponents is to less.
Better make many random searches (Stichproben) with a few games than a few
random searches with much games (games = Stichprobengröße).

Ok, now it's enough with statistic. I think many people understand statistic a
little bit, for the others, it's to difficult to explain it.
Also because my english isn't good enough, for such an difficult theme :)

Andreas




This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.