Author: Dann Corbit
Date: 18:07:24 06/17/04
Go up one level in this thread
On June 17, 2004 at 16:01:33, Peter Fendrich wrote: >On June 16, 2004 at 16:03:02, Dann Corbit wrote: > >>On June 16, 2004 at 14:20:13, Peter Fendrich wrote: >> >>>On June 15, 2004 at 19:37:15, Dann Corbit wrote: >>> >>>>On June 15, 2004 at 18:40:13, Peter Fendrich wrote: >>>> >>>>>On June 15, 2004 at 17:10:43, Dann Corbit wrote: >>>>> >>>>>>On June 15, 2004 at 16:53:16, Peter Fendrich wrote: >>>>>> >>>>>>>On June 15, 2004 at 16:39:20, Dann Corbit wrote: >>>>>>> >>>>>>>>On June 15, 2004 at 16:20:45, Peter Fendrich wrote: >>>>>>>> >>>>>>>- snip - >>>>>>> >>>>>>>>>We could in fact invent a much better rating system for chess engines. The ELO >>>>>>>>>system is designed for humans with a sparse number of games and not for hundreds >>>>>>>>>and thousands of games in long matches. But it works. >>>>>>>>>IMHO it's however not very practical with another rating system when the ELO >>>>>>>>>system is the chess rating standard. >>>>>>>> >>>>>>>>There used to be a nice web site by Royal C. Jones on alternative Elo >>>>>>>>calculation methods. I am no longer able to find it. >>>>>>>> >>>>>>>>Here is a C++ program that performs his alternate calculations in a simulation: >>>>>>>>ftp://cap.connx.com/pub/tournament_software/prog10.cpp >>>>>>>> >>>>>>>>Here is the letter where I asked his permission to use the code: >>>>>>>>ftp://cap.connx.com/pub/tournament_software/Re%20Your%20chess%20rating%20systems.txt >>>>>>> >>>>>>>Yes, I think I once got the link from you: >>>>>>>http://ourworld.cs.com/royjones1999/index.htm >>>>>>>I think with another system it could be done even better for chess engines: >>>>>>>- they don't vary their strength during time like humans >>>>>>>- one can easily play a huge number of games >>>>>>> >>>>>>>with use of Bayeesian alg's... >>>>>> >>>>>>I think the best thing about computer modelling is that we do not necessarily >>>>>>need to assume a gaussian curve. We could fit as many models as we like, and >>>>>>then choose the one that turns out to be the best predictor. >>>>>> >>>>>>It is clear than when Elo figures are drastically different (e.g. 1000 Elo) that >>>>>>the model predicts poorly. >>>>>> >>>>>>Even with moderate difference levels (plan an engine against a pool of peer >>>>>>players, play the same engine against a pool of players 100 Elo below, play the >>>>>>engine with both pools combined) you will see unexplained differences. >>>>> >>>>>How do you know that the pools are so different? >>>> >>>>Because it is not the first time they programs have played against each other. >>>>I have a good idea of their Elo before-hand. >>>> >>>>>Another thing, two small pools could give strange results. "The A always loses >>>>>vs B but have a higher rating" problem will give such effects with small pools. >>>> >>>>Here is the scenario: >>>>The tournament is a round-robin with a very large number of opponents. Each >>>>phase of the round robin starts with one player who plays white and then black >>>>against all the other opponents. In the first few passes of the programs, there >>>>were a large number of very strong programs. Now, the strong programs will have >>>>some sort of provisional rating after 25 sets of gauntletts have been run, since >>>>they will have played 50 games against 25 different opponents. But the average >>>>Elo in this first set of programs is much higher than the average for the entire >>>>pool. What I am seeing is that each new strong program (which had yet to take >>>>its turn against the entire pool) drops in Elo a bit when it faces all the >>>>programs. This indicates to me that playing against stronger opposition gives a >>>>deflated view of the Elo (or conversely, that playing weaker opposition gives an >>>>inflated one). My notion is confirmed by what I see in other tournaments where >>>>opposition strength comes in levels. For instance, look at a program like >>>>SlowChess as it marches through George Lyapko's tournament. Against the early >>>>opposition (of clearly known strength) it has a very high rating. But as it >>>>faces stronger and stronger opposition, the Elo rating drops. So, it might be >>>>that you can inflate your Elo rating by playing a group that is 100 Elo below >>>>your level, as compared to playing a group that is your peer. >>>> >>>>Most of this is just heuristic guessing. I have not done a careful study yet. >>> >>>I'ts hard for me to believe that this is a pattern but no one is perfect... >>>Maybe the distance from the worst in group B to the best in group A is huge? >> >>The shift in average Elo is only a few Elo. However, there are some very weak >>programs (1000 Elo below) in the main group. >> >>Average Elo of a 200+ games program's opposition: >>2294 >>Average Elo of a 64 games program's opposition: >>2351 >>Weakest program: >>1711 >>Strongest program: >>2644 >> >>>How do you compute the ratings? >> >>Elostat > >I would also look for other kinds of patterns. For instance, where did the >stronger programs lose their ratings. Is it some random losses to very weak >programs or is it to programs 200 points lower or maybe evenly spread etc... It could also be a bug in the Elo calculation, as I have not bothered to verify it with some other source.
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.