Author: Robert Hyatt
Date: 15:37:17 02/28/98
Go up one level in this thread
On February 28, 1998 at 17:16:49, Thorsten Czub wrote: > >>I wasn't implying anything wrong at all. Just the huge one-sided wins >>by fritz looks amazing until I noticed the 3X speed handicap. That >>makes >>those particular win/lose numbers mean something different than if they >>were >>posted for equal hardware matches... > >I think Bob talks about the same I do. > >The ELO generated out of these results has not the same QUALITY it would >have had if you would have used SAME machines. >You cannot turn this arround and argue: >But with hiarcs the things worked too. >If it works with hiarcs, the same method does not have to work with >fritz5 too. >The 2 different programs get their playing strength from 2 different >things. >If you change ONE MAIN parameter in your experiment, and the parameter >advances Fritz, than you don't get AEQUIVALENT or comparable results. I think this shows up a common misconception about Elo's rating system. This produces a rating "spread" (not absolute values) that can be used to directly compute the probability of any two players beating each other based only on their Elo-computed ratings. It has *nothing* to do with the corresponding FIDE rating a program might earn. You might play two programs against each other and after 1000 games end up with ratings exactly 200 apart. You might then enter them in human tournaments to play 1000 games each, and when you finish you might find they are only 50 rating points apart. Because you are using *two different player pools* to compute those ratings. Elo's statistical analysis depends on significant numbers of the "pool" playing each other, and it doesn't take into account the bizarre way computers do things. But even worse, the only important thing in the Elo system is the "spread" between two players, not the absolute values of their ratings. That's where we get off into no-mans-land statistically. IE Fritz 5 is 2585 (or so) on the SSDF list, while Hiarcs is 2535 (or so). I'd claim that both are 200 rating points too high, if you compare those numbers to human numbers. But the spread might remain constant no matter what, and would continue to predict the same win/loss ratio...
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.