Computer Chess Club Archives


Search

Terms

Messages

Subject: Some General Factors in CC Testing on 1-Processor Machines

Author: Rolf Tueschen

Date: 05:26:49 12/25/05

Go up one level in this thread


Albert,
ok fine, let's start a new chapter then. But then please dont add this 'perhaps
you didnt know' stuff. I know that I dont know many things but when you are
making these adds I know it quite well just like the meaning of NUNN so and so.

I thank you for the good descriptions, however you let out two main topics I
mentioned.

(2) on the base of 160 games each - what could we maximally conclude?

(3) a found result of a 50 point difference - significance?

But let me come back to the much more important question of the procedure:

(1) because most people only have a single PC they test two programs on a single
machine and forcedly this means that they test in PONDER=OFF mode. You state
that "theory" would say that the results wouldnt be influenced, but perhaps we
could agree that the "strength" of a chessprogram is seriously crippled by such
a practice. How people could invent such strange test designs is beyond myself.

Let me make a surprising conclusion. As long as you dont test more than 160
games, I dont believe in a strength difference of Elo 50 points and likewise I
dont believe in the validity of such tests with crippled programs as such.

Please dont take me wrong, I see well the practical neccessities on the tester
side but if things are as they are we must see a general bias in our results. Of
course it is highly interesting to watch how a crippled J9 will do against the
tested Rybka version in comparison to the test between Rybka and a crippled F9.
In the limits of the allowed which is defined by testtheory.

Of course you already expected the verdict that until now - on the base of your
results - a significant difference between the tested engines could NOT be
proven - all on the base of the given NUNN positions of course. (NB that if you
would find zero difference for the crippled J9 you would still have the same
incertitude.)



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.