Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: This Super Laptop with Fritz 8 would even beat Judith Polgar!

Author: Peter Fendrich

Date: 13:01:33 06/17/04

Go up one level in this thread


On June 16, 2004 at 16:03:02, Dann Corbit wrote:

>On June 16, 2004 at 14:20:13, Peter Fendrich wrote:
>
>>On June 15, 2004 at 19:37:15, Dann Corbit wrote:
>>
>>>On June 15, 2004 at 18:40:13, Peter Fendrich wrote:
>>>
>>>>On June 15, 2004 at 17:10:43, Dann Corbit wrote:
>>>>
>>>>>On June 15, 2004 at 16:53:16, Peter Fendrich wrote:
>>>>>
>>>>>>On June 15, 2004 at 16:39:20, Dann Corbit wrote:
>>>>>>
>>>>>>>On June 15, 2004 at 16:20:45, Peter Fendrich wrote:
>>>>>>>
>>>>>>- snip -
>>>>>>
>>>>>>>>We could in fact invent a much better rating system for chess engines. The ELO
>>>>>>>>system is designed for humans with a sparse number of games and not for hundreds
>>>>>>>>and thousands of games in long matches. But it works.
>>>>>>>>IMHO it's however not very practical with another rating system when the ELO
>>>>>>>>system is the chess rating standard.
>>>>>>>
>>>>>>>There used to be a nice web site by Royal C. Jones on alternative Elo
>>>>>>>calculation methods.  I am no longer able to find it.
>>>>>>>
>>>>>>>Here is a C++ program that performs his alternate calculations in a simulation:
>>>>>>>ftp://cap.connx.com/pub/tournament_software/prog10.cpp
>>>>>>>
>>>>>>>Here is the letter where I asked his permission to use the code:
>>>>>>>ftp://cap.connx.com/pub/tournament_software/Re%20Your%20chess%20rating%20systems.txt
>>>>>>
>>>>>>Yes, I think I once got the link from you:
>>>>>>http://ourworld.cs.com/royjones1999/index.htm
>>>>>>I think with another system it could be done even better for chess engines:
>>>>>>- they don't vary their strength during time like humans
>>>>>>- one can easily play a huge number of games
>>>>>>
>>>>>>with use of Bayeesian alg's...
>>>>>
>>>>>I think the best thing about computer modelling is that we do not necessarily
>>>>>need to assume a gaussian curve.  We could fit as many models as we like, and
>>>>>then choose the one that turns out to be the best predictor.
>>>>>
>>>>>It is clear than when Elo figures are drastically different (e.g. 1000 Elo) that
>>>>>the model predicts poorly.
>>>>>
>>>>>Even with moderate difference levels (plan an engine against a pool of peer
>>>>>players, play the same engine against a pool of players 100 Elo below, play the
>>>>>engine with both pools combined) you will see unexplained differences.
>>>>
>>>>How do you know that the pools are so different?
>>>
>>>Because it is not the first time they programs have played against each other.
>>>I have a good idea of their Elo before-hand.
>>>
>>>>Another thing, two small pools could give strange results. "The A always loses
>>>>vs B but have a higher rating" problem will give such effects with small pools.
>>>
>>>Here is the scenario:
>>>The tournament is a round-robin with a very large number of opponents.  Each
>>>phase of the round robin starts with one player who plays white and then black
>>>against all the other opponents.  In the first few passes of the programs, there
>>>were a large number of very strong programs.  Now, the strong programs will have
>>>some sort of provisional rating after 25 sets of gauntletts have been run, since
>>>they will have played 50 games against 25 different opponents.  But the average
>>>Elo in this first set of programs is much higher than the average for the entire
>>>pool.  What I am seeing is that each new strong program (which had yet to take
>>>its turn against the entire pool) drops in Elo a bit when it faces all the
>>>programs.  This indicates to me that playing against stronger opposition gives a
>>>deflated view of the Elo (or conversely, that playing weaker opposition gives an
>>>inflated one).  My notion is confirmed by what I see in other tournaments where
>>>opposition strength comes in levels.  For instance, look at a program like
>>>SlowChess as it marches through George Lyapko's tournament.  Against the early
>>>opposition (of clearly known strength) it has a very high rating.  But as it
>>>faces stronger and stronger opposition, the Elo rating drops.  So, it might be
>>>that you can inflate your Elo rating by playing a group that is 100 Elo below
>>>your level, as compared to playing a group that is your peer.
>>>
>>>Most of this is just heuristic guessing.  I have not done a careful study yet.
>>
>>I'ts hard for me to believe that this is a pattern but no one is perfect...
>>Maybe the distance from the worst in group B to the best in group A is huge?
>
>The shift in average Elo is only a few Elo.  However, there are some very weak
>programs (1000 Elo below) in the main group.
>
>Average Elo of a 200+ games program's opposition:
>2294
>Average Elo of a 64 games program's opposition:
>2351
>Weakest program:
>1711
>Strongest program:
>2644
>
>>How do you compute the ratings?
>
>Elostat

I would also look for other kinds of patterns. For instance, where did the
stronger programs lose their ratings. Is it some random losses to very weak
programs or is it to programs 200 points lower or maybe evenly spread etc...

/Peter





This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.