Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: A rating inquiry

Author: Enrique Irazoqui

Date: 13:31:01 10/11/98

Go up one level in this thread


On October 11, 1998 at 14:57:49, Moritz Berger wrote:

>On October 11, 1998 at 09:44:23, Enrique Irazoqui wrote:
>
>>On this list, Fritz 5 is between 30 and 70 points higher than
>>all the other top engines, so you would expect Fritz to score about 55 to 60%
>>against them, and this is not necessarily true.
>
>The percentage holds on the average in all kind of experiments I did without
>opening book (or even the 1000 games Anand book I mentioned).
>
>> If Fritz 5 plays 20 games long
>>matches, it will get this score. If it plays 10 games matches, it won't.
>
>From my matches, I cannot confirm this observation. I have also some hundred
>games and didn't notice the phenomenon you describe. I will take a look again
>specifically for the thing you described, but the last 40 games against R10 were
>fairly even right from the start. With a clean book at the beginning.

SSDF matches. First number is percentage, second number is number of games.
70/20 means 70% in 20 games.

Fritz 5 scores against:

               First half          Second half
Rebel8-P90       76/19                80/20
Genius5-P200     57/23                70/23
Mchess7-P200     52/22                61/22
Hiarcs6-P200     38/21                55/21
Genius5-P90      74/21                67/21
Hiarcs5-P90      75/10                80/10
Comet32-P90      95/10                80/10
Shredder2-P200   60/10                70/10
Nimzo3.5-P90     68/20                75/20
Hiarcs6-P90      63/20                75/20
Rebel9-P90       58/19                83/20
Junior3.5-P90    80/10                90/10
TOTAL           64% / 205            72% / 207 = + 64 Elo

In my tournament at 40 moves in 2 hours, both sides on P200MMX, Fritz 5 showed
the same pattern:

F5-H6             40/10                50/10
F5-M7.1           45/10                55/10
F5-R9             40/10                65/10
F5-N98            50/10                65/10
TOTAL            44% / 40             59% / 40

A few days ago I posted my Fritz5-Rebel10 reasults, with a similar pattern.

>> If in
>>tournaments it plays a different opponent every game, it won't get that score
>>either. In fact, it doesn't. Then, the SSDF rating list is no indication of the
>>score Fritz 5 will get in a future event, unless this event reproduces exactly
>>the SSDF way to test. In other words, this Elo list defeats its own purpose of
>>being able to predict performances.
>
>No, it doesn't. At least, it is no less reliable for Fritz than for Hiarcs,
>Rebel, M-Chess, Genius, Shredder, ... you name it.
>
>>An example I already posted: in my tournament of 200 games at 40:2, Fritz 5
>>scored 44% in the first half and 59% in the last half. In the SSDF games, 64% in
>>the first half and 72% in the second half. If you play a tournament of 20 games
>>matches, you will get a very different performance if Fritz 5 plays these 20
>>games in a row or if you split these matches in two halves by exiting the
>>program and restart it again for the last 10 games.
>
>SSDF plays on different machines. On one machine, learning from match 1
>(opponent A) will postitively affect match 2 (opponent B). So SSDF is not quite
>representative for results you or me would get using always the same machine,
>esp. if you consider when interpreting their results like you did that they
>presumably started to use the PowerBook from a certain point onward (1st half of
>the match: fritz5.ctg, 2nd half: PowerBook ...).
>
>> What's going to be the
>>predicted performance after the Elo rating? It depends of how you make Fritz 5
>>play. That's why I'm talking of an SSDF-specific rating.
>
>Of course the rating is the direct result of testing parameters. Certainly the
>relative rating on the SSDF list doesn't hold 100% against humans. I fully agree
>on this.
>
>>Again: I think Fritz 5 is very strong and a tactical wonder. I think the SSDF is
>>not to blame for distortions in their rating list.
>
>Come on, now you have to clarify your terms: What exactly does "distortion" mean
>to you? Cheating?

No. I never said this. I never implied it either. And I don't see the need to
fight back when there is no attack to begin with.

What I mean by "distortion" I explained, or "clarified", before: the Elo rating
is supposed to predict a score a chess player is likely to achieve in a given
event. In this particular case, the Elo of Fritz 5 in the SSDF list does not
achieve this goal. Depending on how you make it play, as I described before, it
will achieve different performances, and therefore the Elo rating is, I think,
distorted.

>Isn't the manual opening book preparation of all other
>programs much more "cheating" in a sense that the program doesn't develop its
>own repertoire by its own playing strength and "understanding" of chess?
>Interpreting a raw database of human games to still get a usable book seems to
>be a greater obstacle than preparing books in decades of work like Sandro Necchi
>(M-Chess) described in his interesting article on CCR (available to all readers
>here at the CCC ressource centre).

Again: I am not talking about cheating. You do.

>> But these distortions are
>>real. Learners can be SSDF specific, meaning: much more efficient in the SSDF
>>way to play matches than in any other case, and this influences greatly this
>>rating list.
>>
>>Enrique
>
>Wasn't it you who fiercely advocated using learners on the SSDF list to overcome
>the "killer book" problem and measure engine playing strength? My memory seems
>to be disfunctional if the Fritz learning mechanism doesn't perfectly fit your
>prescribed solution to the mess.

What I see as disfunctional is the degree of aggressivity. Uncalled for and all
of your own.

It would be a pleasure, at least for me, to be able to discuss this kind of
observations without feeling as in war.

Enrique

>Moritz



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.