Author: Robert Hyatt
Date: 21:55:53 06/19/02
Go up one level in this thread
On June 19, 2002 at 07:18:25, GuyHaworth wrote: > >There are ELO rating lists for: > > people (on the basis of human-human games ... FIDE-managed), and > computers (on the basis of computer-computer games) > >There are apparently some intrinsic problems with rating schemes, maybe >particularly ELO which was the first, and I am looking for more information on >this. > >Each list would be equally valid if N ELO points were subtracted from all >participants ... so the absolute numbers mean nothing. Ok, that would be easy >to fix if there were rated people-computer games. So .... > >... is there an ELO list purely on the basis of computer-human games. There are many problems to overcome. The SSDF has tried, on a couple of occasions, to normalize their ratings to FIDE based on results of a few programs vs FIDE players. Unfortunately, this is statistically invalid and just changes the SSDF ratings but they still have nothing to do with how the machines would do vs humans. And after a few comp vs comp testing cycles, the human effect is washed out and the ratings are back to their old inflated status. you can't take two pools, "freeze them" pick a few players from each and play a few games, then use the ratings from one pool (FIDE) to set the ratings for the few common opponents from the other pool (SSDF) and then normalize the rest of the SSDF pool to those computers that had their ratings normalized to a few FIDE players. That is an average of averages and is beyond useless. The ratings will pass most tests for valid random number sequences... > >I have also heard that there is an 'inflation effect' with ELO. What is this - >and has anyone an 'ELO game simulator' to demonstrate this? I would expect that >there are more games played in SSDF to rate the engines than contribute to the >FIDE human ELO ratings: is this correct? If so, I'd expect the inflation >effect in the SSDF list to be greater. It is. Just look at the top of the SSDF list. It is unavoidable. Because each year you get a new and better "kasparov" added to the list. In the real world, you don't get a group of new players each year that are stronger than everyone else. Yet exactly that happens with computers each year... > >Would it be good to get the Kramnik-DeepFritz computer rated in SSDF as well as >having its match rating against Kramnik? Presumably ChessBase are able to rate >it against Fritz engines in SSDF. > It will give an estimate of Fritz's rating in a pool of two players, one be ing Kramnik, the other being Fritz. Trying to predict how Kramnik would do against other programs by comparing his result with fritz and fritz's result with other programs is again not going to work very well at all... Elo defined a good system. And so long as a single pool of players is used, the ratings are amazingly consistent and their predictive power is good. But _everybody_ finds clever ways to corrupt the process yet say "the ratings can be compared." They are wrong. >Finally, are there better rating schemes than ELO - or are they just different. > Elo just formalized traditional sampling methods and outcome prediction methods. Nothing "new" whatsoever, other than how it is applied to chess in particular. It all traces back to the central limit theorem. >g
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.