Author: Graham Laight
Date: 06:13:18 08/09/01
Go up one level in this thread
On August 09, 2001 at 08:54:51, Uri Blass wrote: >My idea is the following idea > >1)download a pgn of 6 games of a program at 2 hours/40 moves(for example some >of the ssdf games of Deep Fritz) > >2)choose a program that you want to use to evaluate the rating of chess >programs(I am going to call it program X) >Here is the explanation how to use it to evaluate the rating of deep fritz. > >3)give X to calculate for 1 hour every position when Deep Fritz had to move >4)build a table with 2 column when the first column is the time in seconds and >the second column is the number of solutions(number of positions when X suggest >the same move as Deep Fritz) > >It should be something like the following: >time number of solutions >0-1 second 347 solutions >1-2 seconds 372 solutions >2-3 seconds 374 solutions >... >60-61 seconds 431 solutions >... >500-501 seconds 440 solutions >... >3599-3600 seconds 411 solutions > >if 500-501 seconds give the biggest number of solutions than it seems that >500-501 seconds of X is eqvivalent to tournament time control of Deep Fritz. > >It is possible to translate 500-501 seconds to a rating number and find rating >for Deep Fritz(Athlon1200) >Bigger numbers are better and it is possible to assume difference of 70 elo if >the number is twice bigger. > >It is also possible to use X's searches to evaluate rating of other programs >including X vy the same way > >I have some interesting questions: > >1)Do you expect the rating list based on this test and not based on results to >be biased for X or against X? > >2)What is the estimated rating of programs including Deeper blue, Deep blue,Cray >blitz,Deep thought based on this experiment? > >3)What is the estimated error that you expect to get in evaluating the rating of >programs by this way. At the risk of being negative, I think that, unfortunately, this experiment is likely to fail. Unless you can see all the way to the end of the game, you cannot say whether the move program X chose is better than the one DF chose. It might be just a matter of taste. It might be that both choices of move would win. It might be that Deep Fritz chose a poor move. DF might be better than X in some situations, but worse in others. I fear that, at the end of this experiment, the only result that you will obtain is the name of the program which is most similar in playing style to DF. -g
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.