Computer Chess Club Archives

Search

Terms

Messages

Subject: Re: This test is not scientific!

Author: Albrecht Heeffer

Date: 23:39:57 01/26/99

On January 26, 1999 at 14:58:19, Dan Homan wrote:

>
>Hi Don,
>
>I agree completely with your remarks; however, I don't think Bruce
>intended to compare these results directly to Bob's 98% match statistic.
>Bruce intended for Crafty to be re-tested with this same match critera.
>(I assumed that Bruce meant a match at 60s when he suggested 60s test
>times).
>
>I think that there are many wrong ways to go about this - In fact, I am
>not sure if there is a right way, but Bruce's suggestion has the benefit
>of being pretty well defined (i.e. what would most programs on reasonable
>hardware play after 1 minute of searching).  It also has the benefit of
>being short, so we can run a set of tests overnight.
>

I think it is best to compare chess programs with shorter thinking times.
In the end, if you go up to 20 ply, most program will convert in their
opinion about the position. If a chess program selects the right move
in an early phase this is clearly better. That's were chess test sets
like BT2630 are all about. So why not comparing with a fixed depth like,
let's say 8 ply? The log files of the Dutch Open we published on the
web site give a better insight why and when a particular move was
chosen. Comparing the evaluation given by the chess programs is of course
also very important. A different, and hopefully more correct evaluation
of the position by Bionic was the added value we wanted to achieve.
That's wat every chess programmer wants to achieve.

>Do we learn anything specific about Bionic under these circumstances.  I
>don't think so... after all, Bionic played under very different conditions.
>But what we do learn is something about the agreement in move choice
>between programs.  Already the match %'s range from 30% to 80% with the
>very specific requirement that the move match after one minute of search.
>Looser match requirements will certainly lead to higher precentages.  How
>much higher is an interesting question....  I'll run the first game under
>your parameters (20 minutes/move on my 400 MHz Celeron, counting any match
>for any length of time after ply=2).
>
>I don't think this is about Bionic anymore (after all the best test is
>to have a look at the executable they posted) but about the general
>prinicple of drawing conclusions about program similarity from this
>kind of test.
>
> - Dan
>
>P.S.

This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.