Author: Peter Fendrich
Date: 11:20:13 06/16/04
Go up one level in this thread
On June 15, 2004 at 19:37:15, Dann Corbit wrote: >On June 15, 2004 at 18:40:13, Peter Fendrich wrote: > >>On June 15, 2004 at 17:10:43, Dann Corbit wrote: >> >>>On June 15, 2004 at 16:53:16, Peter Fendrich wrote: >>> >>>>On June 15, 2004 at 16:39:20, Dann Corbit wrote: >>>> >>>>>On June 15, 2004 at 16:20:45, Peter Fendrich wrote: >>>>> >>>>- snip - >>>> >>>>>>We could in fact invent a much better rating system for chess engines. The ELO >>>>>>system is designed for humans with a sparse number of games and not for hundreds >>>>>>and thousands of games in long matches. But it works. >>>>>>IMHO it's however not very practical with another rating system when the ELO >>>>>>system is the chess rating standard. >>>>> >>>>>There used to be a nice web site by Royal C. Jones on alternative Elo >>>>>calculation methods. I am no longer able to find it. >>>>> >>>>>Here is a C++ program that performs his alternate calculations in a simulation: >>>>>ftp://cap.connx.com/pub/tournament_software/prog10.cpp >>>>> >>>>>Here is the letter where I asked his permission to use the code: >>>>>ftp://cap.connx.com/pub/tournament_software/Re%20Your%20chess%20rating%20systems.txt >>>> >>>>Yes, I think I once got the link from you: >>>>http://ourworld.cs.com/royjones1999/index.htm >>>>I think with another system it could be done even better for chess engines: >>>>- they don't vary their strength during time like humans >>>>- one can easily play a huge number of games >>>> >>>>with use of Bayeesian alg's... >>> >>>I think the best thing about computer modelling is that we do not necessarily >>>need to assume a gaussian curve. We could fit as many models as we like, and >>>then choose the one that turns out to be the best predictor. >>> >>>It is clear than when Elo figures are drastically different (e.g. 1000 Elo) that >>>the model predicts poorly. >>> >>>Even with moderate difference levels (plan an engine against a pool of peer >>>players, play the same engine against a pool of players 100 Elo below, play the >>>engine with both pools combined) you will see unexplained differences. >> >>How do you know that the pools are so different? > >Because it is not the first time they programs have played against each other. >I have a good idea of their Elo before-hand. > >>Another thing, two small pools could give strange results. "The A always loses >>vs B but have a higher rating" problem will give such effects with small pools. > >Here is the scenario: >The tournament is a round-robin with a very large number of opponents. Each >phase of the round robin starts with one player who plays white and then black >against all the other opponents. In the first few passes of the programs, there >were a large number of very strong programs. Now, the strong programs will have >some sort of provisional rating after 25 sets of gauntletts have been run, since >they will have played 50 games against 25 different opponents. But the average >Elo in this first set of programs is much higher than the average for the entire >pool. What I am seeing is that each new strong program (which had yet to take >its turn against the entire pool) drops in Elo a bit when it faces all the >programs. This indicates to me that playing against stronger opposition gives a >deflated view of the Elo (or conversely, that playing weaker opposition gives an >inflated one). My notion is confirmed by what I see in other tournaments where >opposition strength comes in levels. For instance, look at a program like >SlowChess as it marches through George Lyapko's tournament. Against the early >opposition (of clearly known strength) it has a very high rating. But as it >faces stronger and stronger opposition, the Elo rating drops. So, it might be >that you can inflate your Elo rating by playing a group that is 100 Elo below >your level, as compared to playing a group that is your peer. > >Most of this is just heuristic guessing. I have not done a careful study yet. I'ts hard for me to believe that this is a pattern but no one is perfect... Maybe the distance from the worst in group B to the best in group A is huge? How do you compute the ratings? >>>As an example, I am running a contest with about 6000 games played so far. When >>>a large number of games has been played by some engine (e.g. 200 games) then the >>>rating clearly is changed compared to when it had a smaller number of games >>>against tougher competition. The effect becomes pronounced when you see players >>>of very high Elo take on players of very low Elo. The stronger players >>>basically cannot earn any points, no matter what happens. >>This is a typical chess engine problem. The Elo system is explicitly designed >>for people playing in the same class with about 200 points spread. IIRC that >>problem arises already with 400 points differences and many games. Something >>that never happens for people. > >What about a prodigy on the way up? Not a big problem as long as they don't come in hordes. /Peter
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.