Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: SSDF Rating Irregularities

Author: Robert Hyatt

Date: 18:08:12 12/11/99

Go up one level in this thread


On December 11, 1999 at 12:15:59, Len Eisner wrote:

>On December 11, 1999 at 01:07:46, Robert Hyatt wrote:
>
>>On December 10, 1999 at 22:53:41, Len Eisner wrote:
>>
>>>On December 10, 1999 at 22:26:43, Chuck wrote:
>>>
>>>>On December 10, 1999 at 20:16:13, Bertil Eklund wrote:
>>>>
>>>>>On December 10, 1999 at 20:01:28, Chuck wrote:
>>>>>
>>>>[snip]
>>>>
>>>>>>
>>>>>>I think it is very easy to prove that Robert is right here. It is very
>>>>>>noticeable. I once played out a 40/2 match between a Mach IV and a Mephisto
>>>>>>Polgar, the Mach IV one 10-2. The two machines are very close in strength,
>>>>>>probably within 100-150 points. But playing many games and watching these two
>>>>>>computers evaluate positions, it was evident that the speed of the Mach IV
>>>>>>(compared to the Polgars 5 Mhz) gave it a big tactical advantage. Head-to-head
>>>>>>this seems to be magnified. I played an old MChess against the Mach IV and it
>>>>>>won 11-1. Wow, what results. The point is, when a progam plays another which is
>>>>>>on significantly slower hardware, the faster program is going to win big and
>>>>>>it's rating will be inflated. Two years later, when it becomes the one with slow
>>>>>>hardware, it will be the one getting pounded, and it's rating will go down. I
>>>>>>think at one time the Mach IV was on the SSDF list at around 2200, but late in
>>>>>>it's life it dropped to below 2100.
>>>>>>
>>>>>>Chuck
>>>>>
>>>>>Hi!
>>>>>
>>>>>As checked houndreds of times this is completely wrong.
>>>>>
>>>>>Play two-houndred games and the level should probably be accurate.
>>>>>
>>>>>Bertil SSDF
>>>>
>>>>Then explain to me how the Mach IV had a SSDF rating of 2282 in January 1993 but
>>>>now has a SSDF rating of 2074!?!
>>>>
>>>>Chuck
>>>
>>>Yes, that is the question, and it applies to the other old programs too.  If
>>>computers are anything, they are consistant, so the Mach IV would play exactly
>>>the same today as it did in 1993, yet its rating is over 200 points lower.
>>>
>>>Len
>>
>>
>>You are making a fatal flaw.  Elo ratings predict game outcomes.  The important
>>number is the rating difference between two programs, not the absolute value of
>>each program's rating.  If you play a vs b and get a=2200 and b=2400, that says
>>B should win 3 of every 4 games, roughly.  But if B learns, and a doesn't, then
>>you could expect this to widen over time, as a keeps playing the same bad
>>opening lines, while B learns to avoid lines it loses with.  Eventually their
>>ratings will be over 500 points apart, because A never varies, and B does.
>>
>>You might prefer that A's rating remain constant and B's rating continues to
>>climb... but Elo is a sort of Newtonian physics model, with equal and opposite
>>movement based on game outcomes...
>>
>>When you think about it, it makes sense...
>
>Does this imply that the differences between the old and new programs are
>magnified over time because of the peculiar nature of comp. vs. comp testing?
>And if it does, are the relative rating differences between the programs on the
>SSDF list accurate, or are they magnified in some way?
>
>What I really want to know is this.  If program A plays 200 points stronger than
>program B in comp vs. comp testing, will program A play 200 points stronger than
>program B against humans?
>
>I use the SSDF ratings to determine how a program will perform against me, not
>other programs.  Relative ratings are fine as long as they can be applied to
>performance against people.
>
>Len


Comp vs Comp will say nothing about how comp vs human goes.  IE for an example,
Tiger 12 looks _very_ strong vs computers, but so-so against humans.  I have
not yet studied its games very carefully, although I now have a couple of dozen
games vs Crafty on ICC and FICS.  It seems to be perfectly tuned to beat
computers... it seems very materialistic and ready to accept any gambit offered,
and they try to make the opponent justify it accurately.  How it is going to do
once it is out 'en masse' will be very interesting to watch.  But it clearly
isn't doing _nearly_ as well vs humans (even with anti-human on) as it is doing
against other programs...

Which is completely not surprising.  I said several years ago that to attempt to
write a program to blast to the top of the SSDF is a _totally_ different thing
from trying to write a program to blast to the top of the FIDE rating list.

The games are too different...



This page took 0.01 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.