Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Comments of latest SSDF list Part 1

Author: bob o

Date: 00:43:10 05/26/02

Go up one level in this thread


If I understand correctly, two of the main problems that some people have with
the SSDF is the appearance (fairly or unfairly) of seemingly arbitrary decisions
that they make, such as when to publish; and the statistical meaning of their
results. From what I can tell, there is no list of guidelines on their website
about certain things they do.

Here is what I propose: SSDF publishes a set of guidelines of what specifically
they do in terms of experimental design. Specifically:

1. Which programs are used, including a statement of how to handle upgrades,
such as Fritz 7.0.0.7 vs. 7.0.0.8, for example.

2. What conditions for the match, including number of games and what opponents
to play. The time control seems to be an example of a well-defined criteria they
already have, 40/2 if memory serves me.

3. When the list is published. It seems to come out fairly regularly every two
months or so, but stating this in writing would quiet some critics.

4. Handling hardware upgrades. Stating how frequently the basic hardware will be
upgraded.

They may have such a set of rules, I do not know. If so, please show me; I would
be interested in reading it. But if such a code does not exist, here is an
example of guidelines that I would use.

______________________________________________________________________________

1. Define a standard level of hardware, which we will call "Current Hardware",
and another set called "Recent Hardware". For example, current hardware may be
Athlon 1200, 256 MB RAM and Recent Hardware would be K6-2 450 MHz, 128 MB RAM.
Every 18 months, Current Hardware becomes Recent Hardware, and new hardware
becomes Current Hardware.

2. Only test programs on Current and Recent Hardware.

3. Matches should be exactly 40 games between each of the programs to be tested
on each Current and Recent Hardware.

______________________________________________________________________________

And so on.

I know the SSDF is not obligated to accept my ideas, but I feel that if they
did, their work would be taken better appreciated by the CC community.
Publishing such guidelines would also reduce the number of people nitpicking
certain parts of their work.

Also, they should get a person trained in statistics to express the SSDF's
viewpoints on what is statistically significant, what is within the error
margins, etc. If they published a FAQ page of replies to questions such as,
"What does the 8-pt difference between Fritz and CT really mean?", then there
would not be the endless debate here and other places about the statistical
meaning of their results; rather, we would get to hear from them exactly what
they interpret the meaning of their tests to be.

I appreciate what the SSDF does, and hope that my suggestions would improve the
group's effectiveness in handling CC tourneys.

Bob



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.