Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: A rating inquiry

Author: Moritz Berger

Date: 03:03:29 10/12/98

Go up one level in this thread


On October 11, 1998 at 16:31:01, Enrique Irazoqui wrote:
>On October 11, 1998 at 14:57:49, Moritz Berger wrote:
>>On October 11, 1998 at 09:44:23, Enrique Irazoqui wrote:

>>> If Fritz 5 plays 20 games long
>>>matches, it will get this score. If it plays 10 games matches, it won't.
>>
>>From my matches, I cannot confirm this observation. I have also some hundred
>>games and didn't notice the phenomenon you describe. I will take a look again
>>specifically for the thing you described, but the last 40 games against R10 were
>>fairly even right from the start. With a clean book at the beginning.
>
>SSDF matches. First number is percentage, second number is number of games.
>70/20 means 70% in 20 games.
>
>Fritz 5 scores against:
>
>               First half          Second half
>Rebel8-P90       76/19                80/20
>Genius5-P200     57/23                70/23
>Mchess7-P200     52/22                61/22
>Hiarcs6-P200     38/21                55/21
>Genius5-P90      74/21                67/21
>Hiarcs5-P90      75/10                80/10
>Comet32-P90      95/10                80/10
>Shredder2-P200   60/10                70/10
>Nimzo3.5-P90     68/20                75/20
>Hiarcs6-P90      63/20                75/20
>Rebel9-P90       58/19                83/20
>Junior3.5-P90    80/10                90/10
>TOTAL           64% / 205            72% / 207 = + 64 Elo
>
>In my tournament at 40 moves in 2 hours, both sides on P200MMX, Fritz 5 showed
>the same pattern:
>
>F5-H6             40/10                50/10
>F5-M7.1           45/10                55/10
>F5-R9             40/10                65/10
>F5-N98            50/10                65/10
>TOTAL            44% / 40             59% / 40
>
>A few days ago I posted my Fritz5-Rebel10 reasults, with a similar pattern.
>

I dug out results from a database of around 250 40/120 games I sent you last
year. This seems to show exactly the opposite pattern of what you observed. So I
hope you understand why I didn't immediately follow your conclusions ...

Fritz P133 - Rebel 9 P166 65%/10
Fritz P133 - Rebel 9 P166 45%/10

Fritz P166 - Genius 5/DOS P133 63%/10
Fritz P166 - Genius 5/DOS P133 50%/10
Fritz P166 - Genius 5/DOS P133 45%/10
Fritz P166 - Genius 5/DOS P133 50%/10


>>Come on, now you have to clarify your terms: What exactly does "distortion" mean
>>to you? Cheating?
>
>No. I never said this. I never implied it either. And I don't see the need to
>fight back when there is no attack to begin with.

Sorry, this wasn't meant as an attack. Only that "distortion" in my dictionary
(OED) reads as "The twisting or perversion of words so as to give to them a
different sense; perversion of opinions, facts, history, so as to misapply
them.", which in my opinion certainly doesn't apply to testing at the SSDF.

>What I mean by "distortion" I explained, or "clarified", before: the Elo rating
>is supposed to predict a score a chess player is likely to achieve in a given
>event. In this particular case, the Elo of Fritz 5 in the SSDF list does not
>achieve this goal.

"Does not achieve this goal" is again somewhat uncompromising, given the purely
experimental nature of your falsification.

> Depending on how you make it play, as I described before, it
>will achieve different performances, and therefore the Elo rating is, I think,
>distorted.

This is very much unclear to me. Your inference has to take into account all
other parameters too, i.e. all programs, the nature of differences in the kind
of that environment you're thinking about, the validity of favouring such a
presumed "anti-Fritz" setting (again: I have different observations here) over
testing process at the SSDF etc. ...

>>Isn't the manual opening book preparation of all other
>>programs much more "cheating" in a sense that the program doesn't develop its
>>own repertoire by its own playing strength and "understanding" of chess?
>>Interpreting a raw database of human games to still get a usable book seems to
>>be a greater obstacle than preparing books in decades of work like Sandro Necchi
>>(M-Chess) described in his interesting article on CCR (available to all readers
>>here at the CCC ressource centre).
>
>Again: I am not talking about cheating. You do.

OK, replace "cheating" by "distorting" in the paragraph above and maybe we can
go beyond rejecting each others diction. What's your opinion about my point of
view?

>>> But these distortions are
>>>real. Learners can be SSDF specific, meaning: much more efficient in the SSDF
>>>way to play matches than in any other case, and this influences greatly this
>>>rating list.
>>>
>>>Enrique
>>
>>Wasn't it you who fiercely advocated using learners on the SSDF list to overcome
>>the "killer book" problem and measure engine playing strength? My memory seems
>>to be disfunctional if the Fritz learning mechanism doesn't perfectly fit your
>>prescribed solution to the mess.
>
>What I see as disfunctional is the degree of aggressivity. Uncalled for and all
>of your own.
>
>It would be a pleasure, at least for me, to be able to discuss this kind of
>observations without feeling as in war.

I am sorry if I sounded too harsh to you; but don't forget who is going to quote
you as "now also Enrique Irazoqui agrees that SSDF rating of Fritz 5 is
distorted" ...

As an aside, I remember that Ossi Weiner was the first to use the word
"distorted": (www.computerchess.de)
"Since the release of the latest SSDF Rating List of Feb 22 there has been
serious controversy about its validity. A certain program has been tested in a
special configuration which strongly differs from the generally available
product. This inevitably presented a distorted picture of the true conditions."

I think that your observations are interesting enough to do a follow up on your
research; I started with having a match between Crafty 15.19 engine with
PowerBook (clean, learning enabled) on PII-400 vs. Hiarcs 6 on P233MMX at 60/5
time controls:

Overall score Crafty from 32 games +8 =15 -9   = 48%
games 1-10: 55%
games 11-20: 35%
games 21-30: 50%
[[games 31-32: 75%]]

(to be continued, using exactly the same learning algorithm as Fritz 5 on the
very same PowerBook).

At least in 30 Blitz games, so far the 1st 10 games were clearly the best ...

Do you agree on this testing method? Will you accept my result after a few
hundred Blitz games? Maybe you want to join my experiment and play similar
matches on your machines?


Moritz



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.