Computer Chess Club Archives




Subject: Re: PB-ON vs PB-OFF (final results)

Author: Jeremiah Penery

Date: 05:19:50 10/16/99

Go up one level in this thread

On October 15, 1999 at 22:05:55, Ratko V Tomic wrote:

>>>If the guess rate is 50% (G=0.5) the Cn needs twice as much time
>>> allocated. But with two Rebels,
>>>G is probably 80% or greater,
>> If you have any evidence to support this, please show it.  Otherwise,
>> it is simply a guess.
>When I wrote the above I judged/guessed that the pondering Rebel would predict
>at least 80% of the moves of the non-pondering Rebel. But the results Ed gave
>for the two matches is enough to compute this figure (within statistical error).
>First we compute the rating difference for the two matches using the formula:
>  D = 400*log(W/L)  where W=wins, L=losses
>For the match 1 and 2 we get D1=78 and D2=49. If we denote the rating gain for
>doubling the speed of program (or doubling the thinking time) as C, then the
>rating difference corresponding to the thinking times ratio R will be:
>  D = C * (R-1)   (eq-1)
>If you set the time ratio R=2 (i.e. doubling the thinking time), the rating
>difference is C (agreeing with the definition of C). If the 2 program instances
>have equal time, i.e. R=1 the rating difference is 0, as we would expect.
>Now we define Tp and Tn as times per move (on average) alloted (in the time
>control settings) to the pondering Rebel (Cp) and to non-pondering Rebel (Cn)
>and denote as G the guess rate, so that G=1 means Cp has guessed 100% of Cn's
>moves. The total time/move Cp got for useful thinking is then:
>   UTp=Tp+G*Tn   (eq-2)
>since it will get all the time spent by Cn whenever it guesses the Cn's move.
>The Cn has only the time alloted to it by the time control, hence UTn=Tn. The
>time ratio R in (eq-1) is UTn/UTp. Applying this to the two matches and using
>the fact that in the 1st match Tn=Tp and in the 2nd match Tn=2*Tp, we get the
>following 4 equations:
>  1st match:   R1 = 1+G1     78 = C * (R1-1)
>  2nd match:   R2 = 0.5+G2   49 = C * (R2-1)
>for the 5 unknowns, C, G1, G2, R1 and R2, where R1 and R2 are thinking time
>ratios for the match 1 and match 2, and G1 and G2 guess rates for the two
>matches. Note that due to different time control ratios in match 1 and 2 we
>don't assume that guess ratios are same in the two matches. If we take that
>doubling the Rebel speed gains it 100 rating points (this is a Newborn's figure)
>against the slower Rebel, i.e. C=100, we get G1=78/C=0.78 which is quite close
>to my guess of 0.8.
>The 2nd match, as Ed noted, produced the unexpectedly large performance
>difference. Using the above equations for the 2nd match gives G2=0.99, i.e. as
>if pondering Rebel predicted 99% of the moves of the non-pondering (giving it
>thus the large performance gain). If, on the other hand, we assume G2=G1=0.8
>then the expected rating difference would be 30 ELO points (instead of the 49
>ELO points in the match), i.e. the expected match result should have been 54 to
>46 for the pondering Rebel. Given that the expected statistical uncertainty is
>+/-7 points (i.e. sqrt(50)), the actual result (57:43) is well within the
>statistical uncertainty for the given number of games, and therefore fairly
>consistent with the 80 percent guess rate by the pondering Rebel.

Even after seeing your equations, I'm still not sure how you can determine the
pondering percentage by the winning percentage.  If pondering Rebel had win 100%
of the games, would you say it pondered with 100% accuracy?  (And were you not
the one saying that a shallower search can produce ultimately better moves in
many cases? In these cases, a correct pondering would _worsen_ the result. :)
And if the win percentage was 0, would you say it pondered with 0% accuracy?
Furthermore, we still aren't sure Rebel wouldn't predict the same percentage
(80%?) against other strong engines.  Until someone tests it over a large number
of games, we won't know for sure.

This brings me to another thing:  Crafty already counts the number of correct
ponderings (after exiting book, I think).  It'd be quite interesting for someone
to run this experiment with Crafty vs. Crafty, then Crafty vs. EngineX <instert
favourite engine here> to see how much the pondering rate really differs.  These
results can be pretty accurately applied to the Rebel data, and so it can be
seen whether Ed's experiment really is a bit skewed because he played Rebel vs.
The problem with EngineX vs. EngineY is that playing strength/style/etc. already
differ.  This will skew the results in its own way, which is why Ed did Rebel
vs. Rebel.  He wanted to eliminate as much uncertainty as possible, which he
did.  The only real uncertainties in his test that I can think of are possibly
the opening book and whether Rebel vs. Rebel really has a greater correct
pondering percentage than Rebel vs. EngineX.

Again, I thank Ed for running this experiment, and hope that he or others can
continue to make similar experiments to determine The Truth. :)


This page took 0.09 seconds to execute

Last modified: Thu, 07 Jul 11 08:48:38 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.