Author: Dann Corbit
Date: 13:03:02 06/16/04
Go up one level in this thread
On June 16, 2004 at 14:20:13, Peter Fendrich wrote: >On June 15, 2004 at 19:37:15, Dann Corbit wrote: > >>On June 15, 2004 at 18:40:13, Peter Fendrich wrote: >> >>>On June 15, 2004 at 17:10:43, Dann Corbit wrote: >>> >>>>On June 15, 2004 at 16:53:16, Peter Fendrich wrote: >>>> >>>>>On June 15, 2004 at 16:39:20, Dann Corbit wrote: >>>>> >>>>>>On June 15, 2004 at 16:20:45, Peter Fendrich wrote: >>>>>> >>>>>- snip - >>>>> >>>>>>>We could in fact invent a much better rating system for chess engines. The ELO >>>>>>>system is designed for humans with a sparse number of games and not for hundreds >>>>>>>and thousands of games in long matches. But it works. >>>>>>>IMHO it's however not very practical with another rating system when the ELO >>>>>>>system is the chess rating standard. >>>>>> >>>>>>There used to be a nice web site by Royal C. Jones on alternative Elo >>>>>>calculation methods. I am no longer able to find it. >>>>>> >>>>>>Here is a C++ program that performs his alternate calculations in a simulation: >>>>>>ftp://cap.connx.com/pub/tournament_software/prog10.cpp >>>>>> >>>>>>Here is the letter where I asked his permission to use the code: >>>>>>ftp://cap.connx.com/pub/tournament_software/Re%20Your%20chess%20rating%20systems.txt >>>>> >>>>>Yes, I think I once got the link from you: >>>>>http://ourworld.cs.com/royjones1999/index.htm >>>>>I think with another system it could be done even better for chess engines: >>>>>- they don't vary their strength during time like humans >>>>>- one can easily play a huge number of games >>>>> >>>>>with use of Bayeesian alg's... >>>> >>>>I think the best thing about computer modelling is that we do not necessarily >>>>need to assume a gaussian curve. We could fit as many models as we like, and >>>>then choose the one that turns out to be the best predictor. >>>> >>>>It is clear than when Elo figures are drastically different (e.g. 1000 Elo) that >>>>the model predicts poorly. >>>> >>>>Even with moderate difference levels (plan an engine against a pool of peer >>>>players, play the same engine against a pool of players 100 Elo below, play the >>>>engine with both pools combined) you will see unexplained differences. >>> >>>How do you know that the pools are so different? >> >>Because it is not the first time they programs have played against each other. >>I have a good idea of their Elo before-hand. >> >>>Another thing, two small pools could give strange results. "The A always loses >>>vs B but have a higher rating" problem will give such effects with small pools. >> >>Here is the scenario: >>The tournament is a round-robin with a very large number of opponents. Each >>phase of the round robin starts with one player who plays white and then black >>against all the other opponents. In the first few passes of the programs, there >>were a large number of very strong programs. Now, the strong programs will have >>some sort of provisional rating after 25 sets of gauntletts have been run, since >>they will have played 50 games against 25 different opponents. But the average >>Elo in this first set of programs is much higher than the average for the entire >>pool. What I am seeing is that each new strong program (which had yet to take >>its turn against the entire pool) drops in Elo a bit when it faces all the >>programs. This indicates to me that playing against stronger opposition gives a >>deflated view of the Elo (or conversely, that playing weaker opposition gives an >>inflated one). My notion is confirmed by what I see in other tournaments where >>opposition strength comes in levels. For instance, look at a program like >>SlowChess as it marches through George Lyapko's tournament. Against the early >>opposition (of clearly known strength) it has a very high rating. But as it >>faces stronger and stronger opposition, the Elo rating drops. So, it might be >>that you can inflate your Elo rating by playing a group that is 100 Elo below >>your level, as compared to playing a group that is your peer. >> >>Most of this is just heuristic guessing. I have not done a careful study yet. > >I'ts hard for me to believe that this is a pattern but no one is perfect... >Maybe the distance from the worst in group B to the best in group A is huge? The shift in average Elo is only a few Elo. However, there are some very weak programs (1000 Elo below) in the main group. Average Elo of a 200+ games program's opposition: 2294 Average Elo of a 64 games program's opposition: 2351 Weakest program: 1711 Strongest program: 2644 >How do you compute the ratings? Elostat
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.