Author: Uri Blass
Date: 08:09:48 08/09/01
Go up one level in this thread
On August 09, 2001 at 09:58:01, Robert Hyatt wrote: >On August 09, 2001 at 09:13:18, Graham Laight wrote: > >>On August 09, 2001 at 08:54:51, Uri Blass wrote: >> >>>My idea is the following idea >>> >>>1)download a pgn of 6 games of a program at 2 hours/40 moves(for example some >>>of the ssdf games of Deep Fritz) >>> >>>2)choose a program that you want to use to evaluate the rating of chess >>>programs(I am going to call it program X) >>>Here is the explanation how to use it to evaluate the rating of deep fritz. >>> >>>3)give X to calculate for 1 hour every position when Deep Fritz had to move >>>4)build a table with 2 column when the first column is the time in seconds and >>>the second column is the number of solutions(number of positions when X suggest >>>the same move as Deep Fritz) >>> >>>It should be something like the following: >>>time number of solutions >>>0-1 second 347 solutions >>>1-2 seconds 372 solutions >>>2-3 seconds 374 solutions >>>... >>>60-61 seconds 431 solutions >>>... >>>500-501 seconds 440 solutions >>>... >>>3599-3600 seconds 411 solutions >>> >>>if 500-501 seconds give the biggest number of solutions than it seems that >>>500-501 seconds of X is eqvivalent to tournament time control of Deep Fritz. >>> >>>It is possible to translate 500-501 seconds to a rating number and find rating >>>for Deep Fritz(Athlon1200) >>>Bigger numbers are better and it is possible to assume difference of 70 elo if >>>the number is twice bigger. >>> >>>It is also possible to use X's searches to evaluate rating of other programs >>>including X vy the same way >>> >>>I have some interesting questions: >>> >>>1)Do you expect the rating list based on this test and not based on results to >>>be biased for X or against X? >>> >>>2)What is the estimated rating of programs including Deeper blue, Deep blue,Cray >>>blitz,Deep thought based on this experiment? >>> >>>3)What is the estimated error that you expect to get in evaluating the rating of >>>programs by this way. >> >>At the risk of being negative, I think that, unfortunately, this experiment is >>likely to fail. >> >>Unless you can see all the way to the end of the game, you cannot say whether >>the move program X chose is better than the one DF chose. >> >>It might be just a matter of taste. >> >>It might be that both choices of move would win. >> >>It might be that Deep Fritz chose a poor move. >> >>DF might be better than X in some situations, but worse in others. >> >>I fear that, at the end of this experiment, the only result that you will obtain >>is the name of the program which is most similar in playing style to DF. >> >>-g > > >Very likely correct. This is not an easy thing to do... and trying to use >program X to predict the rating of program Y, based only on how many moves they >"match" looks statistically dangerous. similiar styles do not always mean stronger by my idea I will give an example case A: X and Y agreed on less than 20% of the mobves after 1,2,3...3599 seconds of search X and Y agreed on 20% of the moves after 3600 seconds of search X is going to evaluate Y as a very strong program because the maximal numbers of matches was achieved after 3600 seconds. Case B: X and Z agreed on 100% of the moves after 1 second X and Z agreed on 80% of the moves after 3600 seconds X is going to evaluate Z as a weak program because the maximal number of matches was achieved after 1 seconds. Uri
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.