Author: Albert Silver
Date: 05:57:29 12/25/05
Go up one level in this thread
On December 25, 2005 at 08:26:49, Rolf Tueschen wrote: >Albert, >ok fine, let's start a new chapter then. But then please dont add this 'perhaps >you didnt know' stuff. I know that I dont know many things but when you are >making these adds I know it quite well just like the meaning of NUNN so and so. You have to realize that I found your question fairly basic, and as in any explanation, it's usually better to err on the side of explaining a little too much than too little. You can always jump the things you already know, but if you don't and I say nothing, you'll still be lost. >I thank you for the good descriptions, however you let out two main topics I >mentioned. > >(2) on the base of 160 games each - what could we maximally conclude? > >(3) a found result of a 50 point difference - significance? As Vas said, you can't. But statistics aren't everything, even if they are important. Let me give you an example. Let's take Topalov, the current best player in the world, since Kasparov is retired. He has grown of course, and maybe 2 years ago, he was not so strong. IMHO. His advantage over other elite is not more than 50-100 points. How many games has he played since he has grown to be clear top 2 between Kasparov and himself. I can't say for certain, but I think less than 160 tournament games. Still, I have no doubt of Topalov's superiority. >But let me come back to the much more important question of the procedure: > >(1) because most people only have a single PC they test two programs on a single >machine and forcedly this means that they test in PONDER=OFF mode. You state >that "theory" would say that the results wouldnt be influenced, but perhaps we >could agree that the "strength" of a chessprogram is seriously crippled by such >a practice. How people could invent such strange test designs is beyond myself. If by crippled, you mean it is not at its top conditions then sure you are correct. But why stop there? It also isn't playing with EGTBs, nor its opening book. >Let me make a surprising conclusion. As long as you dont test more than 160 >games, I dont believe in a strength difference of Elo 50 points and likewise I >dont believe in the validity of such tests with crippled programs as such. If the point is to draw scientific results from statistics, then there is no question. I was merely presenting the data I had. As to the validity of the tests, I think you are wrong. The question is whether the relative performances would be any different if they were played with ponder=on. If Engine A would improve its results against Engine B with both playing with Ponder=On compared to its results with both playing with Ponder=Off then you'd have a point, but the results I've seen published here say they are much the same. Don't ask me to link you to the posts, I don't have such links. It was enough for me to have read them. Albert
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.