Author: Bruce Moreland
Date: 12:35:49 01/26/99
Go up one level in this thread
On January 26, 1999 at 12:55:19, Don Dailey wrote: >Hi Dan, > >We appreciate the results you posted. But none of us are not being >very consistant about the way we test so we cannot draw any solid >conclusions from your test. For instance you call it a match >only if the move matches at the 60 second point. This is nothing if not consistent. You have to determine a match somehow, and this method leaves nothing to interpretation. You miss some near-matches, but we are always going to have that problem unless we control this to what I feel is an impractical degree. I am trying to balance accuracy with the reasonable possibility of wide-scale participation. >Bruce want's to do a faster version of the test too, but this >is more or less meaningless. You cannot run a short version >of the test and then say, "see I only got 40% match rate but >Bob's Craftty at 10 minutes on a dual matches 98%" Yours >is not only too short, but you are using a very strict matching >rule. I would guess that even Bionic itself (or whatever ran >at the tournament) would get a poor match rate under these >conditions. You can say it, sure. Nothing wrong with saying it. It's the conclusions you draw that will get you into trouble. The data itself will be neutral unless we lose all restraint regarding interpreting it. Other experiments may suggest themselves. The experiment where Crafty went to 14 plies on a lot of positions has relevance, for instance. We may wish to run other programs at 10 minutes on the dual if we see something like you suggest we may see. >So let's do this test correctly. If time is an issue (it is >certainly a time consuming test as Bruce said) then we should >start with 1 game and go from there. > >I propose we use Bruces EPD data from the first game, run the >test to a very deep level (at least equivalent to a 1000 mhz >running at 10 minutes per position) and consider a match at ANY >POINT AFTER the first 2 ply. If we don't do this, we cannot >make any claims about the results and all error would be on the >side of hanging Bionic, not fair in my opinion. If the methodology >we use has errors, it should be in favor of Bionic, not the other >way around. I disagree that match after 2 ply is in any way more better or more scientific, and I think that running for this long per position will take too long to collect data. Also, I for one would have to write new code in order to collect results in this manner. The "hold until end of test" metric is a common method of scoring test suites, so I figure that this is a practical way of doing it. If this test produces results that show a high Crafty-Bionic similarity ratio, that is a matter for further discussion, there is no intent on my part to move directly to the lynching phase if this turns out to be the case. We disagree about how to do this experiment, but I've already exhibited constructive intent -- I've posted a test suite, have promised to compile results, and I've produced results myself and posted them. If you want to do your own experiment, I wish you the best of luck, and I will help out as I have machine time, but please don't attempt to wreck mine while it is in progress, just because you think you have a better way of doing it, especially since my way won't take waste much time. bruce
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.