Author: Don Dailey
Date: 11:58:41 01/27/99
Go up one level in this thread
On January 26, 1999 at 15:35:49, Bruce Moreland wrote: > >On January 26, 1999 at 12:55:19, Don Dailey wrote: > >>Hi Dan, >> >>We appreciate the results you posted. But none of us are not being >>very consistant about the way we test so we cannot draw any solid >>conclusions from your test. For instance you call it a match >>only if the move matches at the 60 second point. > >This is nothing if not consistent. You have to determine a match somehow, and >this method leaves nothing to interpretation. You miss some near-matches, but >we are always going to have that problem unless we control this to what I feel >is an impractical degree. > >I am trying to balance accuracy with the reasonable possibility of wide-scale >participation. > >>Bruce want's to do a faster version of the test too, but this >>is more or less meaningless. You cannot run a short version >>of the test and then say, "see I only got 40% match rate but >>Bob's Craftty at 10 minutes on a dual matches 98%" Yours >>is not only too short, but you are using a very strict matching >>rule. I would guess that even Bionic itself (or whatever ran >>at the tournament) would get a poor match rate under these >>conditions. > >You can say it, sure. Nothing wrong with saying it. It's the conclusions you >draw that will get you into trouble. The data itself will be neutral unless we >lose all restraint regarding interpreting it. > >Other experiments may suggest themselves. The experiment where Crafty went to >14 plies on a lot of positions has relevance, for instance. We may wish to run >other programs at 10 minutes on the dual if we see something like you suggest we >may see. > >>So let's do this test correctly. If time is an issue (it is >>certainly a time consuming test as Bruce said) then we should >>start with 1 game and go from there. >> >>I propose we use Bruces EPD data from the first game, run the >>test to a very deep level (at least equivalent to a 1000 mhz >>running at 10 minutes per position) and consider a match at ANY >>POINT AFTER the first 2 ply. If we don't do this, we cannot >>make any claims about the results and all error would be on the >>side of hanging Bionic, not fair in my opinion. If the methodology >>we use has errors, it should be in favor of Bionic, not the other >>way around. > >I disagree that match after 2 ply is in any way more better or more scientific, >and I think that running for this long per position will take too long to >collect data. Also, I for one would have to write new code in order to collect >results in this manner. The "hold until end of test" metric is a common method >of scoring test suites, so I figure that this is a practical way of doing it. > >If this test produces results that show a high Crafty-Bionic similarity ratio, >that is a matter for further discussion, there is no intent on my part to move >directly to the lynching phase if this turns out to be the case. > >We disagree about how to do this experiment, but I've already exhibited >constructive intent -- I've posted a test suite, have promised to compile >results, and I've produced results myself and posted them. If you want to do >your own experiment, I wish you the best of luck, and I will help out as I have >machine time, but please don't attempt to wreck mine while it is in progress, >just because you think you have a better way of doing it, especially since my >way won't take waste much time. > >bruce Ok, you make good points here. I'll cooperate with your test methodolgy. - Don
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.