Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: This test is not scientific!

Author: Bruce Moreland

Date: 12:35:49 01/26/99

Go up one level in this thread



On January 26, 1999 at 12:55:19, Don Dailey wrote:

>Hi Dan,
>
>We appreciate the results you posted.   But none of us are not being
>very consistant about the way we test so we cannot draw any solid
>conclusions from your test.   For instance you call it a match
>only if the move matches at the 60 second point.

This is nothing if not consistent.  You have to determine a match somehow, and
this method leaves nothing to interpretation.  You miss some near-matches, but
we are always going to have that problem unless we control this to what I feel
is an impractical degree.

I am trying to balance accuracy with the reasonable possibility of wide-scale
participation.

>Bruce want's to do a faster version of the test too, but this
>is more or less meaningless.  You cannot run a short version
>of the test and then say, "see I only got 40% match rate but
>Bob's Craftty at 10 minutes on a dual matches 98%"   Yours
>is not only too short, but you are using a very strict matching
>rule.  I would guess that even Bionic itself (or whatever ran
>at the tournament) would get a poor match rate under these
>conditions.

You can say it, sure.  Nothing wrong with saying it.  It's the conclusions you
draw that will get you into trouble.  The data itself will be neutral unless we
lose all restraint regarding interpreting it.

Other experiments may suggest themselves.  The experiment where Crafty went to
14 plies on a lot of positions has relevance, for instance.  We may wish to run
other programs at 10 minutes on the dual if we see something like you suggest we
may see.

>So let's do this test correctly.  If time is an issue (it is
>certainly a time consuming test as Bruce said) then we should
>start with 1 game and go from there.
>
>I propose we use Bruces EPD data from the first game,  run the
>test to a very deep level (at least equivalent to a 1000 mhz
>running at 10 minutes per position) and consider a match at ANY
>POINT AFTER the first 2 ply.   If we don't do this, we cannot
>make any claims about the results and all error would be on the
>side of hanging Bionic, not fair in my opinion.   If the methodology
>we use has errors, it should be in favor of Bionic, not the other
>way around.

I disagree that match after 2 ply is in any way more better or more scientific,
and I think that running for this long per position will take too long to
collect data.  Also, I for one would have to write new code in order to collect
results in this manner.  The "hold until end of test" metric is a common method
of scoring test suites, so I figure that this is a practical way of doing it.

If this test produces results that show a high Crafty-Bionic similarity ratio,
that is a matter for further discussion, there is no intent on my part to move
directly to the lynching phase if this turns out to be the case.

We disagree about how to do this experiment, but I've already exhibited
constructive intent -- I've posted a test suite, have promised to compile
results, and I've produced results myself and posted them.  If you want to do
your own experiment, I wish you the best of luck, and I will help out as I have
machine time, but please don't attempt to wreck mine while it is in progress,
just because you think you have a better way of doing it, especially since my
way won't take waste much time.

bruce



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.