Computer Chess Club Archives

Search

Terms

Messages

Subject: Re: an idea to evaluate rating of programs based on pgn file of their games

Author: Heiner Marxen

Date: 09:46:36 08/09/01

On August 09, 2001 at 08:54:51, Uri Blass wrote:

>My idea is the following idea
>
>1)download a pgn of  6 games of a program at 2 hours/40 moves(for example some
>of the ssdf games of Deep Fritz)
>
>2)choose a program that you want to use to evaluate the rating of chess
>programs(I am going to call it program X)
>Here is the explanation how to use it to evaluate the rating of deep fritz.
>
>3)give X to calculate for 1 hour every position when Deep Fritz had to move
>4)build a table with 2 column when the first column is the time in seconds and
>the second column is the number of solutions(number of positions when X suggest
>the same move as Deep Fritz)
>
>It should be something like the following:
>time           number of solutions
>0-1 second           347 solutions
>1-2 seconds          372 solutions
>2-3 seconds          374 solutions
>...
>60-61 seconds        431 solutions
>...
>500-501 seconds      440 solutions
>...
>3599-3600 seconds    411 solutions
>
>if 500-501 seconds give the biggest number of solutions than it seems that
>500-501 seconds of X is eqvivalent to tournament time control of Deep Fritz.
>
>It is possible to translate 500-501 seconds to a rating number and find rating
>for Deep Fritz(Athlon1200)
>Bigger numbers are better and it is possible to assume difference of 70 elo if
>the number is twice bigger.
>
>It is also possible to use X's searches to evaluate rating of other programs
>including X vy the same way

Such a method does involve a lot of omputing time with X.
Normally this time is best invested in a direct match between X and Deep Fritz.
For me the interesting aspect of your proposed method is that the program
to measure (in this example Deep Fritz) need not be executed.  This is
especially of interest, if that program is not available for experiments
but some games are available (like with Deep Blue).

Whether your method works at all, and how accurate it is, is difficult to say
(as others have pointed out, already).  But you could perform some experiments
with programs that have a well established rating (e.g. via SSDF) and
compare the results of your method to the known ratings.  May be they match
well, may be they match badly.

>I have some interesting questions:
>
>1)Do you expect the rating list based on this test and not based on results to
>be biased for X or against X?
>
>2)What is the estimated rating of programs including Deeper blue, Deep blue,Cray
>blitz,Deep thought based on this experiment?
>
>3)What is the estimated error that you expect to get in evaluating the rating of
>programs by this way.

I don't know the answer to any of these questions.  IMO you have to do the
above mentioned experiment to find out.

Regards,
Heiner

This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.