Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: an idea to evaluate rating of programs based on pgn file of their games

Author: Uri Blass

Date: 08:09:48 08/09/01

Go up one level in this thread


On August 09, 2001 at 09:58:01, Robert Hyatt wrote:

>On August 09, 2001 at 09:13:18, Graham Laight wrote:
>
>>On August 09, 2001 at 08:54:51, Uri Blass wrote:
>>
>>>My idea is the following idea
>>>
>>>1)download a pgn of  6 games of a program at 2 hours/40 moves(for example some
>>>of the ssdf games of Deep Fritz)
>>>
>>>2)choose a program that you want to use to evaluate the rating of chess
>>>programs(I am going to call it program X)
>>>Here is the explanation how to use it to evaluate the rating of deep fritz.
>>>
>>>3)give X to calculate for 1 hour every position when Deep Fritz had to move
>>>4)build a table with 2 column when the first column is the time in seconds and
>>>the second column is the number of solutions(number of positions when X suggest
>>>the same move as Deep Fritz)
>>>
>>>It should be something like the following:
>>>time           number of solutions
>>>0-1 second           347 solutions
>>>1-2 seconds          372 solutions
>>>2-3 seconds          374 solutions
>>>...
>>>60-61 seconds        431 solutions
>>>...
>>>500-501 seconds      440 solutions
>>>...
>>>3599-3600 seconds    411 solutions
>>>
>>>if 500-501 seconds give the biggest number of solutions than it seems that
>>>500-501 seconds of X is eqvivalent to tournament time control of Deep Fritz.
>>>
>>>It is possible to translate 500-501 seconds to a rating number and find rating
>>>for Deep Fritz(Athlon1200)
>>>Bigger numbers are better and it is possible to assume difference of 70 elo if
>>>the number is twice bigger.
>>>
>>>It is also possible to use X's searches to evaluate rating of other programs
>>>including X vy the same way
>>>
>>>I have some interesting questions:
>>>
>>>1)Do you expect the rating list based on this test and not based on results to
>>>be biased for X or against X?
>>>
>>>2)What is the estimated rating of programs including Deeper blue, Deep blue,Cray
>>>blitz,Deep thought based on this experiment?
>>>
>>>3)What is the estimated error that you expect to get in evaluating the rating of
>>>programs by this way.
>>
>>At the risk of being negative, I think that, unfortunately, this experiment is
>>likely to fail.
>>
>>Unless you can see all the way to the end of the game, you cannot say whether
>>the move program X chose is better than the one DF chose.
>>
>>It might be just a matter of taste.
>>
>>It might be that both choices of move would win.
>>
>>It might be that Deep Fritz chose a poor move.
>>
>>DF might be better than X in some situations, but worse in others.
>>
>>I fear that, at the end of this experiment, the only result that you will obtain
>>is the name of the program which is most similar in playing style to DF.
>>
>>-g
>
>
>Very likely correct.  This is not an easy thing to do...  and trying to use
>program X to predict the rating of program Y, based only on how many moves they
>"match" looks statistically dangerous.

similiar styles do not always mean stronger by my idea
I will give an example
case A:
X and Y agreed on less than 20% of the mobves after 1,2,3...3599 seconds of
search
X and Y agreed on 20% of the moves after 3600 seconds of search

X is going to evaluate Y as a very strong program because the maximal numbers of
matches was achieved after 3600 seconds.

Case B:
X and Z agreed on 100% of the moves after 1 second
X and Z agreed on 80% of the moves after 3600 seconds

X is going to evaluate Z as a weak program because the maximal number of matches
was achieved after 1 seconds.

Uri



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.