Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: This Super Laptop with Fritz 8 would even beat Judith Polgar!

Author: Dann Corbit

Date: 18:07:24 06/17/04

Go up one level in this thread


On June 17, 2004 at 16:01:33, Peter Fendrich wrote:

>On June 16, 2004 at 16:03:02, Dann Corbit wrote:
>
>>On June 16, 2004 at 14:20:13, Peter Fendrich wrote:
>>
>>>On June 15, 2004 at 19:37:15, Dann Corbit wrote:
>>>
>>>>On June 15, 2004 at 18:40:13, Peter Fendrich wrote:
>>>>
>>>>>On June 15, 2004 at 17:10:43, Dann Corbit wrote:
>>>>>
>>>>>>On June 15, 2004 at 16:53:16, Peter Fendrich wrote:
>>>>>>
>>>>>>>On June 15, 2004 at 16:39:20, Dann Corbit wrote:
>>>>>>>
>>>>>>>>On June 15, 2004 at 16:20:45, Peter Fendrich wrote:
>>>>>>>>
>>>>>>>- snip -
>>>>>>>
>>>>>>>>>We could in fact invent a much better rating system for chess engines. The ELO
>>>>>>>>>system is designed for humans with a sparse number of games and not for hundreds
>>>>>>>>>and thousands of games in long matches. But it works.
>>>>>>>>>IMHO it's however not very practical with another rating system when the ELO
>>>>>>>>>system is the chess rating standard.
>>>>>>>>
>>>>>>>>There used to be a nice web site by Royal C. Jones on alternative Elo
>>>>>>>>calculation methods.  I am no longer able to find it.
>>>>>>>>
>>>>>>>>Here is a C++ program that performs his alternate calculations in a simulation:
>>>>>>>>ftp://cap.connx.com/pub/tournament_software/prog10.cpp
>>>>>>>>
>>>>>>>>Here is the letter where I asked his permission to use the code:
>>>>>>>>ftp://cap.connx.com/pub/tournament_software/Re%20Your%20chess%20rating%20systems.txt
>>>>>>>
>>>>>>>Yes, I think I once got the link from you:
>>>>>>>http://ourworld.cs.com/royjones1999/index.htm
>>>>>>>I think with another system it could be done even better for chess engines:
>>>>>>>- they don't vary their strength during time like humans
>>>>>>>- one can easily play a huge number of games
>>>>>>>
>>>>>>>with use of Bayeesian alg's...
>>>>>>
>>>>>>I think the best thing about computer modelling is that we do not necessarily
>>>>>>need to assume a gaussian curve.  We could fit as many models as we like, and
>>>>>>then choose the one that turns out to be the best predictor.
>>>>>>
>>>>>>It is clear than when Elo figures are drastically different (e.g. 1000 Elo) that
>>>>>>the model predicts poorly.
>>>>>>
>>>>>>Even with moderate difference levels (plan an engine against a pool of peer
>>>>>>players, play the same engine against a pool of players 100 Elo below, play the
>>>>>>engine with both pools combined) you will see unexplained differences.
>>>>>
>>>>>How do you know that the pools are so different?
>>>>
>>>>Because it is not the first time they programs have played against each other.
>>>>I have a good idea of their Elo before-hand.
>>>>
>>>>>Another thing, two small pools could give strange results. "The A always loses
>>>>>vs B but have a higher rating" problem will give such effects with small pools.
>>>>
>>>>Here is the scenario:
>>>>The tournament is a round-robin with a very large number of opponents.  Each
>>>>phase of the round robin starts with one player who plays white and then black
>>>>against all the other opponents.  In the first few passes of the programs, there
>>>>were a large number of very strong programs.  Now, the strong programs will have
>>>>some sort of provisional rating after 25 sets of gauntletts have been run, since
>>>>they will have played 50 games against 25 different opponents.  But the average
>>>>Elo in this first set of programs is much higher than the average for the entire
>>>>pool.  What I am seeing is that each new strong program (which had yet to take
>>>>its turn against the entire pool) drops in Elo a bit when it faces all the
>>>>programs.  This indicates to me that playing against stronger opposition gives a
>>>>deflated view of the Elo (or conversely, that playing weaker opposition gives an
>>>>inflated one).  My notion is confirmed by what I see in other tournaments where
>>>>opposition strength comes in levels.  For instance, look at a program like
>>>>SlowChess as it marches through George Lyapko's tournament.  Against the early
>>>>opposition (of clearly known strength) it has a very high rating.  But as it
>>>>faces stronger and stronger opposition, the Elo rating drops.  So, it might be
>>>>that you can inflate your Elo rating by playing a group that is 100 Elo below
>>>>your level, as compared to playing a group that is your peer.
>>>>
>>>>Most of this is just heuristic guessing.  I have not done a careful study yet.
>>>
>>>I'ts hard for me to believe that this is a pattern but no one is perfect...
>>>Maybe the distance from the worst in group B to the best in group A is huge?
>>
>>The shift in average Elo is only a few Elo.  However, there are some very weak
>>programs (1000 Elo below) in the main group.
>>
>>Average Elo of a 200+ games program's opposition:
>>2294
>>Average Elo of a 64 games program's opposition:
>>2351
>>Weakest program:
>>1711
>>Strongest program:
>>2644
>>
>>>How do you compute the ratings?
>>
>>Elostat
>
>I would also look for other kinds of patterns. For instance, where did the
>stronger programs lose their ratings. Is it some random losses to very weak
>programs or is it to programs 200 points lower or maybe evenly spread etc...

It could also be a bug in the Elo calculation, as I have not bothered to verify
it with some other source.



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.