Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: an idea to evaluate rating of programs based on pgn file of their games

Author: Bruce Moreland

Date: 14:58:17 08/09/01

Go up one level in this thread


On August 09, 2001 at 16:28:19, Robert Hyatt wrote:

>On August 09, 2001 at 12:56:59, Bruce Moreland wrote:
>
>>On August 09, 2001 at 09:58:01, Robert Hyatt wrote:
>>
>>>Very likely correct.  This is not an easy thing to do...  and trying to use
>>>program X to predict the rating of program Y, based only on how many moves they
>>>"match" looks statistically dangerous.
>>>
>>>This is essentially saying that DF played all the right moves, and any
>>>divergence by another program necessarily makes that program weaker.  IE in
>>>Uri's example, DB might appear worse that DF, because they didn't agree.  I'm
>>>certain it is not weaker even if they disagree 100% of the time.  Based on past
>>>games...
>>
>>This is not quite true, I think.
>>
>>He's saying that Fritz has X units of strength when it moves in Y seconds.  He
>>is concluding that at the point where another program has maximum agreement with
>>Fritz, it also has X units of strength.  If it gets more time than that, it will
>>start to disagree with Fritz, because it is making better moves.  With less
>>time, I will disagree because it is making worse moves.  Let's call the time of
>>maximum agreement Y'.  If Y' is less than Y, the new program is stronger, and it
>>may be possible to compute by how much, and vice versa.
>
>What happens if we put Crafty on a super-whammo 400ghz processor, and it never
>agrees with fritz?  Is it weaker?  That was my point.  Just because a program
>disagrees with another program that is known to play at level X at that
>particular search time limit, doesn't mean that the program is either weaker or
>stronger.  It just means it might be weaker, it might be stronger, or it might
>be equal but different.

If you never reach a point of maximum agreement, that would indicate that this
method doesn't work.

Note that I doubt that this would work for what the original poster suggests.

If you played a game with Crafty, on an X-mhz processor, with a given time
control, there's a chance you could use this to figure out what X was.  Once you
know that you could take a guess as to how strong it was on that processor.

All the original poster suggests is that you can do this comparison between
different programs *if* it's true that programs of a given strength tend to want
to play the same moves after the same amount of think time.

We do this thing quite directly with tactical test suites already, but the
poster was trying to generalize it onto a big pile of positions.

bruce

>>I doubt this will work, as long as there are lots of positions where there are
>>more than one playable move.  You'd be looking for information in a big blob of
>>random soup.
>>
>>bruce
>
>
>I agree.  And what you find is written backward in Chinese, with Russian
>footnotes with explanations in Egyption.  :)



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.