Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Calculating Computer Ratings????

Author: Robert Hyatt

Date: 06:49:34 08/02/98

Go up one level in this thread


On August 01, 1998 at 01:14:57, Shaun Graham wrote:

>
>>Well if you would assign a rating of 2400 to all of your opponents, why wouldn't
>>you do this to your program?
>
>In fact you do first assign your program a rating of 2400, but then you see how
>your program a 2400 has performed against all other 2400's to get the new
>rating.
>
>
> I admit that when the rating system was first
>>started, the ratings had to be assigned for at least 1 program to start it off,
>>but now that we have established ratings (SSDF for example) why would we need to
>>assign ratings?
>
>You need to assign ratings only because here the attempt is being more to make
>the rating calculation more accurate.  For instance, you will have a hard time
>convincing almost any of the computer afficionados here that Fritz is 2580+ ElO.
> In fact the  SSDF makes it quite clear that the ssdf rating doesn't necessarily
>correspond to Human Elo.  So the attempt by reassigning the rating is simply a
>first step in normalizing the ratings with Human Elos so that computer ratings
>and human ratings are comparative(for the reason that currently they are not).


your point is valid, but the 2400 seems wrong.  IE what convinces you that you
should start there?  For example, take a 1800 program and start it there and
notice what it does to the other program ratings? They go up, but they should
not.  So you end up with what is commonly called "rating inflation."  That's
why most rating systems have a provisional period to provide a better estimate
on the rating based on results against those with "known" ratings.

The only way to get reasonable Elo ratings for programs is to play them against
humans, and not against each other.  Computer vs Computer is a vastly different
game than computer vs human.  You can take any program, make a fairly serious
change to the eval, or to the search extensions, and not see much difference
against a pool of humans.  But a strong computer opponent will quite quickly
"home in" on such a problem and make you lose game after game.

Ie in the famous first paper on singular extensions, Hsu and company reported a
really significant rating change, when comparing DT with SE to DT without.  They
later noticed that the difference was way over-exaggerated, because the only
difference between the two programs was SE.  Their last paper suggested that SE
was a much more modest improvement.

If I simply took crafty as 2,000 Elo, for version 1, and then played each
successive version against the previous one, and used the traditional Elo rating
calculation, I would now be somewhere around 5500+.  Because minor changes make
major differences in the results between A and B, yet do very little in A vs H,
where H is a human.

Very odd world, this is...




>
> Because programs only learn to avoid certain lines, they really
>>don't learn like humans anyway so no rating system will make their ratings like
>>human ratings. Besides the SSDF list is only good for comparative purposes.
>
>That's the problem it's not good for comparative purposes, i wish it was i'm
>sure you have seen my disccusions on here demonstrating how Fritz is GM strength
> (which it is). However ,apparently it's difficult to show that using the
>current SSDF system, because OBVIOUSLY many people don't accept it.  If they did
>when i said Fritz is GM strength because it's elo is 2589, there would be no
>disagreement.
>


the problem is that SSDF has too much inbreeding in the ratings.  And no one
has ever taken the time, nor gone to the expense, to enter a computer into FIDE
tournaments (FIDE membership is possible, but finding a tournament that would
allow computers might be much more difficult).  So it is quite possible that
fritz, at 2580, is 400 points better than the fidelity mach III at 2180.  But
would that hold up in human events?  I doubt it.  I suspect Fritz would lose
more than expected, and the Mach III would win more than expected.  For the
reasons I gave above.




>
>You
>>are attaching too much importance to the isolated rating number.
>
>No i'm not.  Ratings are all important, it's the only way to show the relative
>strength of computers to human strength.  Thus it is very important to isolate a
>VALID rating for a program firstly, so that you can no how computers really
>compare to humans, and secondly, how so that we can gauge exactly how far along
>the evolutionary tract programs are.
>



correct, but not easily doable.  IE computer vs computer has *nothing* to do
with computers in the human-chess-tournament world.  Because it is all about
statistics, and given two different "pools" of players, the absoluate ratings
would vary significantly, and the spread would vary as well, because the
expected outcome of computer vs computer is different than computer vs human.

Fritz is ideally suited to play other computers.  Very fast, very deep.  But
I'd expect it to do worse than some other programs against a group of GM
players.  Anand was an example. Shredded Fritz trivially, had to work to beat
Rebel in the two slow games.  Yet I'm sure that fritz will beat Rebel in a
match, as has been seen on SSDF.

I'm more interested in the computer vs human games, but I do pay attention to
computer vs computer when possible...


> Ratings abhor a
>>vacuum. You need lots of competitors to have a good system and the SSDF is a
>>closed shop.
>
>No they are not a closed shop, as the data is readily available to be examined
>and calculated by anyone with the inclination.  They have no stranglehold on the
>knowledge of how to calculate ratings, and if you look at another of the follow
>ups to this post, you will find that the SSDF is in fact instituting a plan
>similar to the one the i have suggested(recalculating from scratch, not
>incrementally).
>

but it has the same problem..  because someone will still assume that Elo 2500
is the same as SSDF 2500.  And it still won't be so, until the games come from
players in a *common population*...




>
>Shaun



This page took 0.01 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.