Author: Miguel A. Ballicora
Date: 10:23:49 08/09/01
Go up one level in this thread
On August 09, 2001 at 08:54:51, Uri Blass wrote: >My idea is the following idea > >1)download a pgn of 6 games of a program at 2 hours/40 moves(for example some >of the ssdf games of Deep Fritz) > >2)choose a program that you want to use to evaluate the rating of chess >programs(I am going to call it program X) >Here is the explanation how to use it to evaluate the rating of deep fritz. > >3)give X to calculate for 1 hour every position when Deep Fritz had to move >4)build a table with 2 column when the first column is the time in seconds and >the second column is the number of solutions(number of positions when X suggest >the same move as Deep Fritz) > >It should be something like the following: >time number of solutions >0-1 second 347 solutions >1-2 seconds 372 solutions >2-3 seconds 374 solutions >... >60-61 seconds 431 solutions >... >500-501 seconds 440 solutions >... >3599-3600 seconds 411 solutions > >if 500-501 seconds give the biggest number of solutions than it seems that >500-501 seconds of X is eqvivalent to tournament time control of Deep Fritz. > >It is possible to translate 500-501 seconds to a rating number and find rating >for Deep Fritz(Athlon1200) >Bigger numbers are better and it is possible to assume difference of 70 elo if >the number is twice bigger. > >It is also possible to use X's searches to evaluate rating of other programs >including X vy the same way > >I have some interesting questions: > >1)Do you expect the rating list based on this test and not based on results to >be biased for X or against X? > >2)What is the estimated rating of programs including Deeper blue, Deep blue,Cray >blitz,Deep thought based on this experiment? > >3)What is the estimated error that you expect to get in evaluating the rating of >programs by this way. Based on intuition, I like the spirit of your idea but I don't like some implementation details. I am sure it will be useful to test weak engines but I do not know it will be the same for stronger ones. At least, what I would do is to run those games with several strong programs (say, 1 hour a move) and make a collection of "good alternatives" for each move. Then I will run my program (the one to be rated) and see how long it takes to find at least one of the alternatives. It might be useful to tune evaluation parameters of a program? trying to get one of the good alternatives as fast as possible? The disadvantage is that it takes a lot of time of calculation. Regards, Miguel
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.