Author: Enrique Irazoqui
Date: 13:31:01 10/11/98
Go up one level in this thread
On October 11, 1998 at 14:57:49, Moritz Berger wrote: >On October 11, 1998 at 09:44:23, Enrique Irazoqui wrote: > >>On this list, Fritz 5 is between 30 and 70 points higher than >>all the other top engines, so you would expect Fritz to score about 55 to 60% >>against them, and this is not necessarily true. > >The percentage holds on the average in all kind of experiments I did without >opening book (or even the 1000 games Anand book I mentioned). > >> If Fritz 5 plays 20 games long >>matches, it will get this score. If it plays 10 games matches, it won't. > >From my matches, I cannot confirm this observation. I have also some hundred >games and didn't notice the phenomenon you describe. I will take a look again >specifically for the thing you described, but the last 40 games against R10 were >fairly even right from the start. With a clean book at the beginning. SSDF matches. First number is percentage, second number is number of games. 70/20 means 70% in 20 games. Fritz 5 scores against: First half Second half Rebel8-P90 76/19 80/20 Genius5-P200 57/23 70/23 Mchess7-P200 52/22 61/22 Hiarcs6-P200 38/21 55/21 Genius5-P90 74/21 67/21 Hiarcs5-P90 75/10 80/10 Comet32-P90 95/10 80/10 Shredder2-P200 60/10 70/10 Nimzo3.5-P90 68/20 75/20 Hiarcs6-P90 63/20 75/20 Rebel9-P90 58/19 83/20 Junior3.5-P90 80/10 90/10 TOTAL 64% / 205 72% / 207 = + 64 Elo In my tournament at 40 moves in 2 hours, both sides on P200MMX, Fritz 5 showed the same pattern: F5-H6 40/10 50/10 F5-M7.1 45/10 55/10 F5-R9 40/10 65/10 F5-N98 50/10 65/10 TOTAL 44% / 40 59% / 40 A few days ago I posted my Fritz5-Rebel10 reasults, with a similar pattern. >> If in >>tournaments it plays a different opponent every game, it won't get that score >>either. In fact, it doesn't. Then, the SSDF rating list is no indication of the >>score Fritz 5 will get in a future event, unless this event reproduces exactly >>the SSDF way to test. In other words, this Elo list defeats its own purpose of >>being able to predict performances. > >No, it doesn't. At least, it is no less reliable for Fritz than for Hiarcs, >Rebel, M-Chess, Genius, Shredder, ... you name it. > >>An example I already posted: in my tournament of 200 games at 40:2, Fritz 5 >>scored 44% in the first half and 59% in the last half. In the SSDF games, 64% in >>the first half and 72% in the second half. If you play a tournament of 20 games >>matches, you will get a very different performance if Fritz 5 plays these 20 >>games in a row or if you split these matches in two halves by exiting the >>program and restart it again for the last 10 games. > >SSDF plays on different machines. On one machine, learning from match 1 >(opponent A) will postitively affect match 2 (opponent B). So SSDF is not quite >representative for results you or me would get using always the same machine, >esp. if you consider when interpreting their results like you did that they >presumably started to use the PowerBook from a certain point onward (1st half of >the match: fritz5.ctg, 2nd half: PowerBook ...). > >> What's going to be the >>predicted performance after the Elo rating? It depends of how you make Fritz 5 >>play. That's why I'm talking of an SSDF-specific rating. > >Of course the rating is the direct result of testing parameters. Certainly the >relative rating on the SSDF list doesn't hold 100% against humans. I fully agree >on this. > >>Again: I think Fritz 5 is very strong and a tactical wonder. I think the SSDF is >>not to blame for distortions in their rating list. > >Come on, now you have to clarify your terms: What exactly does "distortion" mean >to you? Cheating? No. I never said this. I never implied it either. And I don't see the need to fight back when there is no attack to begin with. What I mean by "distortion" I explained, or "clarified", before: the Elo rating is supposed to predict a score a chess player is likely to achieve in a given event. In this particular case, the Elo of Fritz 5 in the SSDF list does not achieve this goal. Depending on how you make it play, as I described before, it will achieve different performances, and therefore the Elo rating is, I think, distorted. >Isn't the manual opening book preparation of all other >programs much more "cheating" in a sense that the program doesn't develop its >own repertoire by its own playing strength and "understanding" of chess? >Interpreting a raw database of human games to still get a usable book seems to >be a greater obstacle than preparing books in decades of work like Sandro Necchi >(M-Chess) described in his interesting article on CCR (available to all readers >here at the CCC ressource centre). Again: I am not talking about cheating. You do. >> But these distortions are >>real. Learners can be SSDF specific, meaning: much more efficient in the SSDF >>way to play matches than in any other case, and this influences greatly this >>rating list. >> >>Enrique > >Wasn't it you who fiercely advocated using learners on the SSDF list to overcome >the "killer book" problem and measure engine playing strength? My memory seems >to be disfunctional if the Fritz learning mechanism doesn't perfectly fit your >prescribed solution to the mess. What I see as disfunctional is the degree of aggressivity. Uncalled for and all of your own. It would be a pleasure, at least for me, to be able to discuss this kind of observations without feeling as in war. Enrique >Moritz
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.