Author: bob o
Date: 00:43:10 05/26/02
Go up one level in this thread
If I understand correctly, two of the main problems that some people have with the SSDF is the appearance (fairly or unfairly) of seemingly arbitrary decisions that they make, such as when to publish; and the statistical meaning of their results. From what I can tell, there is no list of guidelines on their website about certain things they do. Here is what I propose: SSDF publishes a set of guidelines of what specifically they do in terms of experimental design. Specifically: 1. Which programs are used, including a statement of how to handle upgrades, such as Fritz 7.0.0.7 vs. 7.0.0.8, for example. 2. What conditions for the match, including number of games and what opponents to play. The time control seems to be an example of a well-defined criteria they already have, 40/2 if memory serves me. 3. When the list is published. It seems to come out fairly regularly every two months or so, but stating this in writing would quiet some critics. 4. Handling hardware upgrades. Stating how frequently the basic hardware will be upgraded. They may have such a set of rules, I do not know. If so, please show me; I would be interested in reading it. But if such a code does not exist, here is an example of guidelines that I would use. ______________________________________________________________________________ 1. Define a standard level of hardware, which we will call "Current Hardware", and another set called "Recent Hardware". For example, current hardware may be Athlon 1200, 256 MB RAM and Recent Hardware would be K6-2 450 MHz, 128 MB RAM. Every 18 months, Current Hardware becomes Recent Hardware, and new hardware becomes Current Hardware. 2. Only test programs on Current and Recent Hardware. 3. Matches should be exactly 40 games between each of the programs to be tested on each Current and Recent Hardware. ______________________________________________________________________________ And so on. I know the SSDF is not obligated to accept my ideas, but I feel that if they did, their work would be taken better appreciated by the CC community. Publishing such guidelines would also reduce the number of people nitpicking certain parts of their work. Also, they should get a person trained in statistics to express the SSDF's viewpoints on what is statistically significant, what is within the error margins, etc. If they published a FAQ page of replies to questions such as, "What does the 8-pt difference between Fritz and CT really mean?", then there would not be the endless debate here and other places about the statistical meaning of their results; rather, we would get to hear from them exactly what they interpret the meaning of their tests to be. I appreciate what the SSDF does, and hope that my suggestions would improve the group's effectiveness in handling CC tourneys. Bob
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.