Author: Hristo
Date: 23:46:49 11/30/03
Go up one level in this thread
On December 01, 2003 at 01:44:17, Russell Reagan wrote: >On December 01, 2003 at 01:03:10, Steven Edwards wrote: > >>The recent fiasco regarding a suspected clone has shown that the process used, >>an anonymous accusation followed by a coercive source demand, is an unacceptably >>poor method for handling potential source plagiarism. >> >>The clear need here is for a method that does not depend on subjective human >>evaluation of similarity of play or upon the random accusation of a non-biased >>party. My proposal is instead to use a test suite to provide a performance >>fingerprint of all the entrants in a competition. >> >>This fingerprint is constucted by running the same EPD test suite for each >>program immediately prior to the start of an event and then automatically >>checking the resulting EPD output files with some predetermined similarity >>metric. The same suite can be fed to non-competing programs if necessary. The >>similarity test would look at both the PV and the evaluation scores of each >>record generated and this should be enough for clone detection. >> >>The test suite has to be the same for each program, but it does not have to be >>the same suite for each event; neither does it have to be disclosed beforehand. >>It would be best to automatically generate the suite by taking a hundred or so >>high level decisive game scores and selecting a near terminal position from each >>one. The selected position would be for the winning side a few moves prior to >>the end of the game. >> >>Advantages: >> >>1. Does not depend on random accusations. >> >>2. Source code is kept private. >> >>3. Equal application for all entrants. >> >>4. No subjectivity, except for deciding the cutoff point for too much >>similarity. >> >>5. Mostly automated process. >> >>6. Done prior to the event, so no surprises during the event. >> >>7. Should discourage cloners from entering an undisclosed clone in the first >>place. >> >>Disadvantages: >> >>1. Requires an hour or so of computing for each program per event. >> >>2. Someone has to write the similarity metric generator. > > >How refreshing to read something that makes sense :-) > >One comment I have is that using a test suite might not be the best idea. We are >trying to determine whether any two engines are the same, or very similar. Test >suites usually have positions where there is one clearly best move. I think most >of the computer programs would agree on a great majority of the moves in test >suites, and we wouldn't learn very much about their similarities. > >My suggestion would be to just take a game played by humans, maybe varying from >modern games to very old games, and have the engines process each position of >several games. This would give a good variety of different kinds of positions, >as well as preferable positions for our purposes here. In other words, we don't >want positions where there are one or two moves that are clearly the best. We >want positions where there are many playable moves, so the engines will >hopefully differ more in their preference. > >Another potentially useful metric is consecutive "hits". If two engines agree on >the best move for a position for 5 moves in a row, then you would have a >consecutive hit number of 5. You could run tests to see what is a common >consecutive hit number for the games to be used by testing a variety of engines >before the event participant engines are tested. If an engine varies >significantly from the norm, you have a potential clone. You could do three >tests: consecutive white positions only, consecutive black positions only, and >all positions consecutively. > >Another idea might be to have several levels of tests. If an engine fails the >"test position" test, then maybe we run the source code through a more advanced >'diff'. I know some universities have this kind of software that will analyze >two source code files and tell you how similar they are, regardless of the >coding style used. IE they test the functionality of the program. I think there >are some that will do this even for a binary. The main advantage to this would >be that no one has to "share" their source code (no humans get their hands on it >anyway). Some measure could be taken to ensure the computer used for the testing >doesn't save a copy of the source. For instance, the testing computer might be a >very bare bones machine, only with the necessary components for this test >(mainly no networking abilities). Each programmer could have a look at the >machine if they'd like. Each programmer could load his code into a particular >specified directory on the testing computer, under supervision. Then the test is >run. The results are displayed. Then the computer's hard drive is wiped clean. >All of this could be done with the participants present and no one ever has to >see anyone's code. This should be a secondary test, so hopefully very few >participants would ever get to this point. > >And of course, if a participant fails the first two tests, it might be time to >show your code or take a hike ;-) Russell, these are all interesting ideas. But I'm afraid that none of them would work. What is the question: Is this program an original work? This is a tall order even if you have the source code and it is quite a bit taller without the source. The first set of "solutions"can be summarized as; Determine the function-implementation and methods used based on known input and expected output. This "black box" evaluation is a good way of determining the answer to "What does it do?" and it is a lot less useful or reliable to determine "How does it do it?". The second problem with this approach is that the measuring rod, say crafty, evolves over time and given the same initial conditions it might yield similar results to some future version of say Fritz or Shreder. What should be done on this case? Exempt some programs or force them to submit full source code? The third problem is that at some point can be determined that few minor changes in crafty would produce similar results to some other program; now what? As crafty gets more powerful more programs will be subjected to this type of situations. What do you do? I believe that this particular approach is a dead-end street. If it worked, however, it would be the best, least obtrusive method of determining similarities across number of programs. Source code level automatic detection tools are a complete nightmare: 1) How many false hits before such method is thrown over board? Given the problem domain, there are many programs that share very similar structure and flow. 2) How do you prove that someone's solution is not original. After all the methods and logical constructs are public knowledge and the implementation differences do not have to be all that great. 3) Logical construct in the program flow are extremely difficult to catch without thoroughly understanding the problem domain. Automated tools are useless in this case and only the good old eye-finger coordination has a chance. I don't see a good solution to this problem, short of requesting the source code and build instructions for all participants. But even this falls short of proving the original question: Is this an original work? How many people can gobble-up the source code of 12-20 programs in a 2 week period and be able to declare, with confidence, that program AZ is an illegal derivative of one of the 100 programs out there? A full time job for uber-geek. Regards, Hristo
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.