Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: CCRL 40/40 Rating list and stats updated

Author: Kirill Kryukov

Date: 05:48:10 02/20/06

Go up one level in this thread


Hi Mike,

Thanks for comments and interest!

On February 19, 2006 at 12:50:47, Mike S. wrote:

>Thanks! - I see 6 engines, ranks #2 to #7 within only 19 Elos (2830-2849) with
>the error margins being bigger than +/- 20. (But this includes 2 deeps on
>duals.) This is remarkable and tells us that these many engines are virtually
>equally strong, in comp-comp under the CCRL conditions. Or if they are not
>equally strong (and I don't doubt that some will want to claim this :-)), then
>it seems to be very hard to prove even with hundred of games. Maybe the gaps
>will become bigger with 1.000+ games each, or maybe not...

Yes, it's getting crowded at the top of the rating list.. :-)


>I also find the engine correlation statistics very interesting. For example, the
>"most different" engines seem to be Fritz compared to Ktulu and Tiger, 0.93
>rating difference. Surprising about this is, that we find the 2nd biggest rating
>diff between Ktulu and Tiger. So, all these 3 are very different.

There can be different explanations of these observations, not necessarily the
different style of play. For example some engines are known to be "optimistic" -
as soon as they see slight advantage they over-estimate it. Such things happen,
and there are "conservative" engines too.

The "Eval difference" table compares what engines "say", while "Ponder hit"
table compares what they "do", so to speak. :-) This is important difference,
because theoretically an engine can report one evaluation and use totally
different one internally. So, we should be careful about those tables. "Ponder
hit" table is totally reliable on the other hand, you can't trick it. :-) I also
think the number of moves in many pairs is still too low for good discussion.


>This is useful for example, if you prefer one engine for analysis mainly, but
>want to choose the best engine for "2nd opinions." Then it makes sense IMO, to
>chose the engine which is most different, from your collection (at least for the
>middlegame).

I would still stick to top 5 engines for serious analysis. :-) (OK, top 7, as I
like Junior). Please note that since we compare engines without own book, our
rating list shows particularly well how much each engine is good for analysis.
(When you compare engines with own book they start thinking in deep middlegame
or in endgame, in our study engines have to work from about 10-th move on).


>We also learn from the engine correlation statistics, that there is a typical
>minimum of ~52% correctly expected moves, but no engine has yet achieved 62%
>average. Any head-to-head comparison was smaller than 80%, even with closely
>related engines; see i.e. DJ9/J9 77%. We also find high values between engines
>and their predecessing versions each (like H9/H10).

Yes, this particular example (H9/H10) shows that there can be apparent
correlation even between engines with big rating difference.

>Regards,
>Mike Scheidl


Best,
Kirill



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.