Computer Chess Club Archives

Search

Terms
Messages

Subject: Re: A fix for the clone detection problem

Author: Bob Durrett
Date: 07:09:34 12/01/03
On December 01, 2003 at 01:03:10, Steven Edwards wrote:

>The recent fiasco regarding a suspected clone has shown that the process used,
>an anonymous accusation followed by a coercive source demand, is an unacceptably
>poor method for handling potential source plagiarism.
>
>The clear need here is for a method that does not depend on subjective human
>evaluation of similarity of play or upon the random accusation of a non-biased
>party.  My proposal is instead to use a test suite to provide a performance
>fingerprint of all the entrants in a competition.
>
>This fingerprint is constucted by running the same EPD test suite for each
>program immediately prior to the start of an event and then automatically
>checking the resulting EPD output files with some predetermined similarity
>metric.  The same suite can be fed to non-competing programs if necessary.  The
>similarity test would look at both the PV and the evaluation scores of each
>record generated and this should be enough for clone detection.
>
>The test suite has to be the same for each program, but it does not have to be
>the same suite for each event; neither does it have to be disclosed beforehand.
>It would be best to automatically generate the suite by taking a hundred or so
>high level decisive game scores and selecting a near terminal position from each
>one.  The selected position would be for the winning side a few moves prior to
>the end of the game.
>
>Advantages:
>
>1. Does not depend on random accusations.
>
>2. Source code is kept private.
>
>3. Equal application for all entrants.
>
>4. No subjectivity, except for deciding the cutoff point for too much
>similarity.
>
>5. Mostly automated process.
>
>6. Done prior to the event, so no surprises during the event.
>
>7. Should discourage cloners from entering an undisclosed clone in the first
>place.
>
>Disadvantages:
>
>1. Requires an hour or so of computing for each program per event.
>
>2. Someone has to write the similarity metric generator.

(a)  I feel that the ideas you express here are good in principle and hope that
any weaknesses can be ironed out by appropriate changes.  Especially important
would be selection of the right kinds of test positions as Russell has pointed
out.

(b)  Perhaps, however, it would be good to make sure that we are all trying to
solve the right problem!!!!!!!!!!!!

At WCCC an accusation was made and the tournament organizers "knee-jerk
reacted."  The real issue should be whether or not the accusation was worthy of
any action at all.  The accuser used the word "plagiarism" and that, presumably,
is what caused the knee-jerk reaction.  The word "plagiarism" is an emotional
word which prompts strong emotions.

What, exactly, was the accused supposed to have done?  Even if the accusation
was correct, there is still the issue as to whether or not a crime [i.e. an
infraction of the tournament rules] had been committed.  Before you devise a
test to determine whether or not something was done, it is prudent to first
determine whether or not the alleged action was improper. If it was not
improper, no test need be devised.

As best I can tell from the bulletins posted here, someone convinced the
tournament organizers that "much" or "some" of the code in the program was
similar to that found in the open-source code of Crafty.  How they convinced the
tournament organizers is truly a mystery, since the accused source code was not
available for inspection.  Presumably, some sort of "decompiler" was used
[assuming that such a thing exists and was available] or, in the absence of a
"decompiler," that performance data similarity was used.

The "ICC Cheater Cops" have developed and/or been using various techniques for
detecting "cheating" and maybe some of those equally mysterious secret
techniques were used.  : )

It is absolutely essential that everyone agree on the purpose of the proposed
testing.  How else could anybody decide whether or not punative action were
indicated?  We must quantify "good" and "bad" in the context of computer
tournaments.  Only with this well defined and quantified will it be possible to
devise pass and fail criteria for the test software.

Bob D.
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.