Computer Chess Club Archives

Search

Terms
Messages

Subject: Re: To Mel: you misquoted Dr. Hyatt.

Author: Robert Hyatt
Date: 12:12:41 06/02/99
On June 02, 1999 at 13:51:47, Melvin S. Schwartz wrote:

>
>On June 01, 1999 at 14:02:10, Dann Corbit wrote:
>
>>On June 01, 1999 at 13:40:26, Melvin S. Schwartz wrote:
>>[snip]
>>>Hello Dann!
>>>
>>>I probably should just set-up Hiarcs, Fritz, or Rebel and take away their Queen
>>>before the first move and have some fun instead of prolonging this topic;
>>>however, I am a person who when having strong opinions feels he must express
>>>them even when he knows nothing about what he's talking. :-)
>>There is nothing wrong with feeling passionate about things.  It is just that
>>you may not necessarily be correct, despite the strong feelings.
>>
>>>If the SSDF could test all the programs against each other on the identical
>>>computer, don't you think they would consider that a more accurate way to get
>>>ratings than what they are doing now?
>>They would be less accurate, and possibly useless.  For instance, if I get two
>>1GHz CPU and put Hiarcs7.32 and Fritz 5.32 on them and let them rage against
>>each other, believe it or not, I will have no mathematical results at all from
>>the contest!  That is because neither one of them has a measured strength. The
>>objective of the SSDF is to provide a true ability rating that is mathematically
>>sound. While there are always great difficulties associated with a thing like
>>this and there are going to be problems, it is essential that the tests
>>conducted be performed with opponents of known strength.  The better and more
>>accurately you know the strength of an opponent, the better and more accurately
>>you will know the strength of the new configuration.  The way to determine the
>>strenght of an opponent (human or computer) is a mathematical formula that uses
>>the strength of the opponent as one of its arguments.  If this number is "iffy"
>>(+/- one standard deviation is a large number) then the quality of the
>>mathematical answer to that equation is also bad.  Therefore, the more tests you
>>have with a particular system and program combination, the more valuable it
>>becomes for determining the strength of the opponents.
>>
>>Does it become more clear now?
>
>Well, if I am wrong, than blame the people responsible for writing in both the
>manuals for Hiarcs 7 and Fritz 5.32 that the programs strength would be better
>by running them on a faster pc!
>
>Mel


You are overlooking the rating (Elo) issue here.  If everyone on the SSDF
suddenly upgraded from P5/200/mmx to K6/450's, and continued testing as they
are right now, what would you do about ratings?  IE say program X has an SSDF
rating of 2500 on the P5/200/mmx, what do you do to calculate its rating on the
new hardware?

The answer is easy.  Play it against programs on the old hardware (where you
have known ratings).  This has a flaw in that (IMHO) small differences in two
programs are often exaggerated in computer vs computer games.  But they have
always had that problem (which has led to significantly inflated ratings IMHO,
when you look at the actual "number" being published).

But in reality, the only other way to get ratings would be to play in human
events and those are rare opportunities nowadays.  So K6/450's vs P5/200/mmx
games _must_ be played to let the K6 establish ratings against opponents of
known strength.  _then_ you can start the computer vs computer games on equal
hardware to find out which is the 'absolute strongest' program (based on the
SSDF rating numbers)...
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.