Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: This test is not scientific!

Author: Dann Corbit

Date: 12:46:20 01/26/99

Go up one level in this thread


On January 26, 1999 at 15:11:15, Bruce Moreland wrote:
[snip]
>Sometimes you have a hypothesis, you test it, and your hypothesis is correct.
>Other times it is incorrect.
>
>You can be doing science in either case, and in either case the result can be
>interesting.
>
>I have a few hypotheses, but I haven't defined them before the experiment, so I
>am not being as rigorous as I could be.
You have an interesting point.  I would like to propose a hypothesis that can be
easily tested.
1.  Given enough time, all excellent chess programs will decide upon the same
best move (possibly infinite).
2.  The probability for agreement of a move will therefore be largely a function
of time.  If I let two programs think for an hour per move, the programs will
agree most of the time.  If I let them think for one week per move, they will
almost always agree (unless the fundamental algorithms are different -- but for
most programs they are about the same).  If we let the programs think for only
1/100 second the agreement will be poor -- even if it is the same program.

In other words, I believe that agreement will be a function of time.  Given
enough time, I even think that we could classify programs.  For example:
Program x, y, z agree consistently on positions of this nature over thousands of
one hour trials.  Therefore, King safety has a weight between a and b for these
programs.  Or we might conclude that a class of programs considers a bishop
slightly more valuable than a knight, etc.

Are we trying to prove that B.I. is based on crafty?  We already know that.  So
the only question is, "How much is original?"

I do not believe that this sort of testing will prove decisive.  But I do agree
that it is an interesting experiment.  I think this is a better idea.  Have
somone run the nunn test positions at one hour time controls using 5 different
programs including crafty, on known hardware.  Label the tests a,b,c,d,e.
Repeat the experiment 30 times to produce a table.  Using that table, decide not
only which program is crafty, but also try to identify the other programs.

300 hours to create the data set.  I use 30 because a lot of nice statistical
properties happen around that observation count.

You could do the same for 300 minutes or even 300 seconds if you like.  I
believe that they will have closer and closer agreements as the time levels
increase.

Consider EPD test sets.  As you let your computer run longer, it tends to get
more and more answers correct.  Given enough time, it should get most of them
correct if you have a good program.  Since we already know that a correct answer
exists and that given enough time it will be found, the conclusion seems to have
some merit.



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.