Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: an idea to evaluate rating of programs based on pgn file of their games

Author: Robert Hyatt

Date: 18:52:13 08/09/01

Go up one level in this thread


On August 09, 2001 at 17:58:17, Bruce Moreland wrote:

>On August 09, 2001 at 16:28:19, Robert Hyatt wrote:
>
>>On August 09, 2001 at 12:56:59, Bruce Moreland wrote:
>>
>>>On August 09, 2001 at 09:58:01, Robert Hyatt wrote:
>>>
>>>>Very likely correct.  This is not an easy thing to do...  and trying to use
>>>>program X to predict the rating of program Y, based only on how many moves they
>>>>"match" looks statistically dangerous.
>>>>
>>>>This is essentially saying that DF played all the right moves, and any
>>>>divergence by another program necessarily makes that program weaker.  IE in
>>>>Uri's example, DB might appear worse that DF, because they didn't agree.  I'm
>>>>certain it is not weaker even if they disagree 100% of the time.  Based on past
>>>>games...
>>>
>>>This is not quite true, I think.
>>>
>>>He's saying that Fritz has X units of strength when it moves in Y seconds.  He
>>>is concluding that at the point where another program has maximum agreement with
>>>Fritz, it also has X units of strength.  If it gets more time than that, it will
>>>start to disagree with Fritz, because it is making better moves.  With less
>>>time, I will disagree because it is making worse moves.  Let's call the time of
>>>maximum agreement Y'.  If Y' is less than Y, the new program is stronger, and it
>>>may be possible to compute by how much, and vice versa.
>>
>>What happens if we put Crafty on a super-whammo 400ghz processor, and it never
>>agrees with fritz?  Is it weaker?  That was my point.  Just because a program
>>disagrees with another program that is known to play at level X at that
>>particular search time limit, doesn't mean that the program is either weaker or
>>stronger.  It just means it might be weaker, it might be stronger, or it might
>>be equal but different.
>
>If you never reach a point of maximum agreement, that would indicate that this
>method doesn't work.
>
>Note that I doubt that this would work for what the original poster suggests.
>
>If you played a game with Crafty, on an X-mhz processor, with a given time
>control, there's a chance you could use this to figure out what X was.  Once you
>know that you could take a guess as to how strong it was on that processor.

That is possible.  But using Crafty to figure out how strong crafty is on a
particular machine only has one degree of freedom...  the hardware being used.

If you use two different programs and two different machines, that is four
degrees of freedom.  Which is a bunch.




>
>All the original poster suggests is that you can do this comparison between
>different programs *if* it's true that programs of a given strength tend to want
>to play the same moves after the same amount of think time.
>
>We do this thing quite directly with tactical test suites already, but the
>poster was trying to generalize it onto a big pile of positions.


I know.  Tactics makes some sense in this context.  But not positional things,
since that is not only a function of time, it is a function of incorporated
knowledge.



>
>bruce
>
>>>I doubt this will work, as long as there are lots of positions where there are
>>>more than one playable move.  You'd be looking for information in a big blob of
>>>random soup.
>>>
>>>bruce
>>
>>
>>I agree.  And what you find is written backward in Chinese, with Russian
>>footnotes with explanations in Egyption.  :)



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.