Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: This test is not scientific!

Author: Don Dailey

Date: 11:58:41 01/27/99

Go up one level in this thread


On January 26, 1999 at 15:35:49, Bruce Moreland wrote:

>
>On January 26, 1999 at 12:55:19, Don Dailey wrote:
>
>>Hi Dan,
>>
>>We appreciate the results you posted.   But none of us are not being
>>very consistant about the way we test so we cannot draw any solid
>>conclusions from your test.   For instance you call it a match
>>only if the move matches at the 60 second point.
>
>This is nothing if not consistent.  You have to determine a match somehow, and
>this method leaves nothing to interpretation.  You miss some near-matches, but
>we are always going to have that problem unless we control this to what I feel
>is an impractical degree.
>
>I am trying to balance accuracy with the reasonable possibility of wide-scale
>participation.
>
>>Bruce want's to do a faster version of the test too, but this
>>is more or less meaningless.  You cannot run a short version
>>of the test and then say, "see I only got 40% match rate but
>>Bob's Craftty at 10 minutes on a dual matches 98%"   Yours
>>is not only too short, but you are using a very strict matching
>>rule.  I would guess that even Bionic itself (or whatever ran
>>at the tournament) would get a poor match rate under these
>>conditions.
>
>You can say it, sure.  Nothing wrong with saying it.  It's the conclusions you
>draw that will get you into trouble.  The data itself will be neutral unless we
>lose all restraint regarding interpreting it.
>
>Other experiments may suggest themselves.  The experiment where Crafty went to
>14 plies on a lot of positions has relevance, for instance.  We may wish to run
>other programs at 10 minutes on the dual if we see something like you suggest we
>may see.
>
>>So let's do this test correctly.  If time is an issue (it is
>>certainly a time consuming test as Bruce said) then we should
>>start with 1 game and go from there.
>>
>>I propose we use Bruces EPD data from the first game,  run the
>>test to a very deep level (at least equivalent to a 1000 mhz
>>running at 10 minutes per position) and consider a match at ANY
>>POINT AFTER the first 2 ply.   If we don't do this, we cannot
>>make any claims about the results and all error would be on the
>>side of hanging Bionic, not fair in my opinion.   If the methodology
>>we use has errors, it should be in favor of Bionic, not the other
>>way around.
>
>I disagree that match after 2 ply is in any way more better or more scientific,
>and I think that running for this long per position will take too long to
>collect data.  Also, I for one would have to write new code in order to collect
>results in this manner.  The "hold until end of test" metric is a common method
>of scoring test suites, so I figure that this is a practical way of doing it.
>
>If this test produces results that show a high Crafty-Bionic similarity ratio,
>that is a matter for further discussion, there is no intent on my part to move
>directly to the lynching phase if this turns out to be the case.
>
>We disagree about how to do this experiment, but I've already exhibited
>constructive intent -- I've posted a test suite, have promised to compile
>results, and I've produced results myself and posted them.  If you want to do
>your own experiment, I wish you the best of luck, and I will help out as I have
>machine time, but please don't attempt to wreck mine while it is in progress,
>just because you think you have a better way of doing it, especially since my
>way won't take waste much time.
>
>bruce

Ok,  you make good points here.  I'll cooperate with your test
methodolgy.

- Don



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.