Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: SSDF Rating Irregularities

Author: Len Eisner

Date: 09:15:59 12/11/99

Go up one level in this thread


On December 11, 1999 at 01:07:46, Robert Hyatt wrote:

>On December 10, 1999 at 22:53:41, Len Eisner wrote:
>
>>On December 10, 1999 at 22:26:43, Chuck wrote:
>>
>>>On December 10, 1999 at 20:16:13, Bertil Eklund wrote:
>>>
>>>>On December 10, 1999 at 20:01:28, Chuck wrote:
>>>>
>>>[snip]
>>>
>>>>>
>>>>>I think it is very easy to prove that Robert is right here. It is very
>>>>>noticeable. I once played out a 40/2 match between a Mach IV and a Mephisto
>>>>>Polgar, the Mach IV one 10-2. The two machines are very close in strength,
>>>>>probably within 100-150 points. But playing many games and watching these two
>>>>>computers evaluate positions, it was evident that the speed of the Mach IV
>>>>>(compared to the Polgars 5 Mhz) gave it a big tactical advantage. Head-to-head
>>>>>this seems to be magnified. I played an old MChess against the Mach IV and it
>>>>>won 11-1. Wow, what results. The point is, when a progam plays another which is
>>>>>on significantly slower hardware, the faster program is going to win big and
>>>>>it's rating will be inflated. Two years later, when it becomes the one with slow
>>>>>hardware, it will be the one getting pounded, and it's rating will go down. I
>>>>>think at one time the Mach IV was on the SSDF list at around 2200, but late in
>>>>>it's life it dropped to below 2100.
>>>>>
>>>>>Chuck
>>>>
>>>>Hi!
>>>>
>>>>As checked houndreds of times this is completely wrong.
>>>>
>>>>Play two-houndred games and the level should probably be accurate.
>>>>
>>>>Bertil SSDF
>>>
>>>Then explain to me how the Mach IV had a SSDF rating of 2282 in January 1993 but
>>>now has a SSDF rating of 2074!?!
>>>
>>>Chuck
>>
>>Yes, that is the question, and it applies to the other old programs too.  If
>>computers are anything, they are consistant, so the Mach IV would play exactly
>>the same today as it did in 1993, yet its rating is over 200 points lower.
>>
>>Len
>
>
>You are making a fatal flaw.  Elo ratings predict game outcomes.  The important
>number is the rating difference between two programs, not the absolute value of
>each program's rating.  If you play a vs b and get a=2200 and b=2400, that says
>B should win 3 of every 4 games, roughly.  But if B learns, and a doesn't, then
>you could expect this to widen over time, as a keeps playing the same bad
>opening lines, while B learns to avoid lines it loses with.  Eventually their
>ratings will be over 500 points apart, because A never varies, and B does.
>
>You might prefer that A's rating remain constant and B's rating continues to
>climb... but Elo is a sort of Newtonian physics model, with equal and opposite
>movement based on game outcomes...
>
>When you think about it, it makes sense...

Does this imply that the differences between the old and new programs are
magnified over time because of the peculiar nature of comp. vs. comp testing?
And if it does, are the relative rating differences between the programs on the
SSDF list accurate, or are they magnified in some way?

What I really want to know is this.  If program A plays 200 points stronger than
program B in comp vs. comp testing, will program A play 200 points stronger than
program B against humans?

I use the SSDF ratings to determine how a program will perform against me, not
other programs.  Relative ratings are fine as long as they can be applied to
performance against people.

Len



This page took 0.01 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.