Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: A fix for the clone detection problem

Author: Russell Reagan

Date: 22:44:17 11/30/03

Go up one level in this thread


On December 01, 2003 at 01:03:10, Steven Edwards wrote:

>The recent fiasco regarding a suspected clone has shown that the process used,
>an anonymous accusation followed by a coercive source demand, is an unacceptably
>poor method for handling potential source plagiarism.
>
>The clear need here is for a method that does not depend on subjective human
>evaluation of similarity of play or upon the random accusation of a non-biased
>party.  My proposal is instead to use a test suite to provide a performance
>fingerprint of all the entrants in a competition.
>
>This fingerprint is constucted by running the same EPD test suite for each
>program immediately prior to the start of an event and then automatically
>checking the resulting EPD output files with some predetermined similarity
>metric.  The same suite can be fed to non-competing programs if necessary.  The
>similarity test would look at both the PV and the evaluation scores of each
>record generated and this should be enough for clone detection.
>
>The test suite has to be the same for each program, but it does not have to be
>the same suite for each event; neither does it have to be disclosed beforehand.
>It would be best to automatically generate the suite by taking a hundred or so
>high level decisive game scores and selecting a near terminal position from each
>one.  The selected position would be for the winning side a few moves prior to
>the end of the game.
>
>Advantages:
>
>1. Does not depend on random accusations.
>
>2. Source code is kept private.
>
>3. Equal application for all entrants.
>
>4. No subjectivity, except for deciding the cutoff point for too much
>similarity.
>
>5. Mostly automated process.
>
>6. Done prior to the event, so no surprises during the event.
>
>7. Should discourage cloners from entering an undisclosed clone in the first
>place.
>
>Disadvantages:
>
>1. Requires an hour or so of computing for each program per event.
>
>2. Someone has to write the similarity metric generator.


How refreshing to read something that makes sense :-)

One comment I have is that using a test suite might not be the best idea. We are
trying to determine whether any two engines are the same, or very similar. Test
suites usually have positions where there is one clearly best move. I think most
of the computer programs would agree on a great majority of the moves in test
suites, and we wouldn't learn very much about their similarities.

My suggestion would be to just take a game played by humans, maybe varying from
modern games to very old games, and have the engines process each position of
several games. This would give a good variety of different kinds of positions,
as well as preferable positions for our purposes here. In other words, we don't
want positions where there are one or two moves that are clearly the best. We
want positions where there are many playable moves, so the engines will
hopefully differ more in their preference.

Another potentially useful metric is consecutive "hits". If two engines agree on
the best move for a position for 5 moves in a row, then you would have a
consecutive hit number of 5. You could run tests to see what is a common
consecutive hit number for the games to be used by testing a variety of engines
before the event participant engines are tested. If an engine varies
significantly from the norm, you have a potential clone. You could do three
tests: consecutive white positions only, consecutive black positions only, and
all positions consecutively.

Another idea might be to have several levels of tests. If an engine fails the
"test position" test, then maybe we run the source code through a more advanced
'diff'. I know some universities have this kind of software that will analyze
two source code files and tell you how similar they are, regardless of the
coding style used. IE they test the functionality of the program. I think there
are some that will do this even for a binary. The main advantage to this would
be that no one has to "share" their source code (no humans get their hands on it
anyway). Some measure could be taken to ensure the computer used for the testing
doesn't save a copy of the source. For instance, the testing computer might be a
very bare bones machine, only with the necessary components for this test
(mainly no networking abilities). Each programmer could have a look at the
machine if they'd like. Each programmer could load his code into a particular
specified directory on the testing computer, under supervision. Then the test is
run. The results are displayed. Then the computer's hard drive is wiped clean.
All of this could be done with the participants present and no one ever has to
see anyone's code. This should be a secondary test, so hopefully very few
participants would ever get to this point.

And of course, if a participant fails the first two tests, it might be time to
show your code or take a hike ;-)



This page took 0.01 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.