Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: an idea to evaluate rating of programs based on pgn file of their games

Author: Graham Laight

Date: 06:13:18 08/09/01

Go up one level in this thread


On August 09, 2001 at 08:54:51, Uri Blass wrote:

>My idea is the following idea
>
>1)download a pgn of  6 games of a program at 2 hours/40 moves(for example some
>of the ssdf games of Deep Fritz)
>
>2)choose a program that you want to use to evaluate the rating of chess
>programs(I am going to call it program X)
>Here is the explanation how to use it to evaluate the rating of deep fritz.
>
>3)give X to calculate for 1 hour every position when Deep Fritz had to move
>4)build a table with 2 column when the first column is the time in seconds and
>the second column is the number of solutions(number of positions when X suggest
>the same move as Deep Fritz)
>
>It should be something like the following:
>time           number of solutions
>0-1 second           347 solutions
>1-2 seconds          372 solutions
>2-3 seconds          374 solutions
>...
>60-61 seconds        431 solutions
>...
>500-501 seconds      440 solutions
>...
>3599-3600 seconds    411 solutions
>
>if 500-501 seconds give the biggest number of solutions than it seems that
>500-501 seconds of X is eqvivalent to tournament time control of Deep Fritz.
>
>It is possible to translate 500-501 seconds to a rating number and find rating
>for Deep Fritz(Athlon1200)
>Bigger numbers are better and it is possible to assume difference of 70 elo if
>the number is twice bigger.
>
>It is also possible to use X's searches to evaluate rating of other programs
>including X vy the same way
>
>I have some interesting questions:
>
>1)Do you expect the rating list based on this test and not based on results to
>be biased for X or against X?
>
>2)What is the estimated rating of programs including Deeper blue, Deep blue,Cray
>blitz,Deep thought based on this experiment?
>
>3)What is the estimated error that you expect to get in evaluating the rating of
>programs by this way.

At the risk of being negative, I think that, unfortunately, this experiment is
likely to fail.

Unless you can see all the way to the end of the game, you cannot say whether
the move program X chose is better than the one DF chose.

It might be just a matter of taste.

It might be that both choices of move would win.

It might be that Deep Fritz chose a poor move.

DF might be better than X in some situations, but worse in others.

I fear that, at the end of this experiment, the only result that you will obtain
is the name of the program which is most similar in playing style to DF.

-g



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.