Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: This Super Laptop with Fritz 8 would even beat Judith Polgar!

Author: Peter Fendrich

Date: 11:20:13 06/16/04

Go up one level in this thread


On June 15, 2004 at 19:37:15, Dann Corbit wrote:

>On June 15, 2004 at 18:40:13, Peter Fendrich wrote:
>
>>On June 15, 2004 at 17:10:43, Dann Corbit wrote:
>>
>>>On June 15, 2004 at 16:53:16, Peter Fendrich wrote:
>>>
>>>>On June 15, 2004 at 16:39:20, Dann Corbit wrote:
>>>>
>>>>>On June 15, 2004 at 16:20:45, Peter Fendrich wrote:
>>>>>
>>>>- snip -
>>>>
>>>>>>We could in fact invent a much better rating system for chess engines. The ELO
>>>>>>system is designed for humans with a sparse number of games and not for hundreds
>>>>>>and thousands of games in long matches. But it works.
>>>>>>IMHO it's however not very practical with another rating system when the ELO
>>>>>>system is the chess rating standard.
>>>>>
>>>>>There used to be a nice web site by Royal C. Jones on alternative Elo
>>>>>calculation methods.  I am no longer able to find it.
>>>>>
>>>>>Here is a C++ program that performs his alternate calculations in a simulation:
>>>>>ftp://cap.connx.com/pub/tournament_software/prog10.cpp
>>>>>
>>>>>Here is the letter where I asked his permission to use the code:
>>>>>ftp://cap.connx.com/pub/tournament_software/Re%20Your%20chess%20rating%20systems.txt
>>>>
>>>>Yes, I think I once got the link from you:
>>>>http://ourworld.cs.com/royjones1999/index.htm
>>>>I think with another system it could be done even better for chess engines:
>>>>- they don't vary their strength during time like humans
>>>>- one can easily play a huge number of games
>>>>
>>>>with use of Bayeesian alg's...
>>>
>>>I think the best thing about computer modelling is that we do not necessarily
>>>need to assume a gaussian curve.  We could fit as many models as we like, and
>>>then choose the one that turns out to be the best predictor.
>>>
>>>It is clear than when Elo figures are drastically different (e.g. 1000 Elo) that
>>>the model predicts poorly.
>>>
>>>Even with moderate difference levels (plan an engine against a pool of peer
>>>players, play the same engine against a pool of players 100 Elo below, play the
>>>engine with both pools combined) you will see unexplained differences.
>>
>>How do you know that the pools are so different?
>
>Because it is not the first time they programs have played against each other.
>I have a good idea of their Elo before-hand.
>
>>Another thing, two small pools could give strange results. "The A always loses
>>vs B but have a higher rating" problem will give such effects with small pools.
>
>Here is the scenario:
>The tournament is a round-robin with a very large number of opponents.  Each
>phase of the round robin starts with one player who plays white and then black
>against all the other opponents.  In the first few passes of the programs, there
>were a large number of very strong programs.  Now, the strong programs will have
>some sort of provisional rating after 25 sets of gauntletts have been run, since
>they will have played 50 games against 25 different opponents.  But the average
>Elo in this first set of programs is much higher than the average for the entire
>pool.  What I am seeing is that each new strong program (which had yet to take
>its turn against the entire pool) drops in Elo a bit when it faces all the
>programs.  This indicates to me that playing against stronger opposition gives a
>deflated view of the Elo (or conversely, that playing weaker opposition gives an
>inflated one).  My notion is confirmed by what I see in other tournaments where
>opposition strength comes in levels.  For instance, look at a program like
>SlowChess as it marches through George Lyapko's tournament.  Against the early
>opposition (of clearly known strength) it has a very high rating.  But as it
>faces stronger and stronger opposition, the Elo rating drops.  So, it might be
>that you can inflate your Elo rating by playing a group that is 100 Elo below
>your level, as compared to playing a group that is your peer.
>
>Most of this is just heuristic guessing.  I have not done a careful study yet.

I'ts hard for me to believe that this is a pattern but no one is perfect...
Maybe the distance from the worst in group B to the best in group A is huge?
How do you compute the ratings?

>>>As an example, I am running a contest with about 6000 games played so far.  When
>>>a large number of games has been played by some engine (e.g. 200 games) then the
>>>rating clearly is changed compared to when it had a smaller number of games
>>>against tougher competition.  The effect becomes pronounced when you see players
>>>of very high Elo take on players of very low Elo.  The stronger players
>>>basically cannot earn any points, no matter what happens.
>>This is a typical chess engine problem. The Elo system is explicitly designed
>>for people playing in the same class with about 200 points spread. IIRC that
>>problem arises already with 400 points differences and many games. Something
>>that never happens for people.
>
>What about a prodigy on the way up?

Not a big problem as long as they don't come in hordes.
/Peter



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.