Author: Don Dailey
Date: 11:31:51 01/27/99
Go up one level in this thread
On January 26, 1999 at 18:46:52, KarinsDad wrote: >On January 26, 1999 at 17:52:25, Bruce Moreland wrote: > >> >>On January 26, 1999 at 16:25:17, KarinsDad wrote: >> >>>I'm glad that you are running other programs against the control. At what times >>>are you running the programs, on what type and speed processors, and what is >>>your matching criteria? >> >>One minute per move, you choose the processor, and a match is scored if you'd >>play the move at the end of the minute. > >I would prefer slower times. I think that the main indicator is nodes per second >times number of seconds or average total nodes per move. I realize that this is >difficult to estimate for Bionic, however, you guys have been doing this for a >long time and I think you could come up with an "educated guess". > >I understand your practicality issue, however, I'd rather take one of the games >that Robert checked and run as close to an approximate in number of nodes per >move as I could (and yes, all of this is questionable due to the search changes >of running a program with SMP vs. no SMP, different hash sizes, etc.), rather >than run all of the games for very short durations. > >Statistically, neither sample set is large enough nor accurate enough (one game >run at more exacting times, or multiple games run at quick times) to be >considered scientific. Any results you get, no matter how you do it, have to be >taken with a grain of salt. I don't believe this is correct. What makes something scientific is how you interpret the results and what you do with it. My intent is to let the results guide me. I never draw firm conclusions from anything than an infinite amount of data. In the case where at least one other program gets a high match rate we have a result that is adequate for our needs. In this case we should drop the discussion and consider the matter closed. In the example like Bruce gives, where the best matching program is 60% and Crafty matches 95%, we have somethng that is significant. In this case I don't consider Bionic guilty, I just don't consider the matter closed. If you apply black and white to statistical data there is never enough data to be "scientific" but I don't intend to draw such a firm conclusion from this data. You are only right if you thought we intended to prove something from this data. Do you understand now? Nevertheless, I am insisting on a more consistant testing methodology. We can't all run at different depths on different hardware with different matching rules and expect to learn much. I am in favor of the most liberal matching rules and deeper runs (as you are) because they better match the actual playing conditions and the more liberal matching rules gives Bionic the benefit of the doubt which I think is the more fair approach. If the results are unclear even with liberal matching rules we drop it. As far as matching unequal hardware, I have no problem with considering all pentium class machines equal by clock speed (even though they really are not) and overcompensating for the slower ones. What I would like to see is that someone does a reference Crafty test (with the version that is alledged to be the Bionic version) and that all other programs and tests be guaranteed to be at least equal in hardware/time. For instance if the reference is done at Bruce's 1 minute level on a pentium II 400 MGZ we might require everyone else to run at 2 minutes adjusted by clock speed. 4 minutes for a pentium 200, etc. We might double the time for 486's etc. This test by nature is inexact, but if you construct it correctly, it can be used effectively to draw some conclusions, not the least of which might be to stop the discussion. - Don >My way would (rough guess) take 7 (?) games * 10 minutes per move per side (due >to a slower speed system) * 120 moves per game (for both sides combined, this is >an average, I did not look it up) * the number of programs tested (say 6) or >about 5 weeks if one person did it all. However, if you gave a different game >each to 7 individuals (all of whom had all 6 programs to test with), it would >take about 5 days. This would give you a better (but still not perfect) set of >data then 1 minute per move IMO. > >However, I am not doing the tests, so I'm not trying to tell you how to do it. >Just my opinion. > >> >>I am flexible about the processor because I didn't want to split hairs over >>whether a P5/133 is X% slower than a P6/200 or whatever. I figured that a few >>people might run this on Crafty uing different hardware, and that might make >>show us what effect this had on match rate. >> >>This is a little too multivariate to make a good controlled experiment, but >>people will have reservations, possibly the same people, no matter what attempts >>are made to control the experiment better. I don't think it is possible to >>control it perfectly, so if you try to do so, people will point out the flaws >>anyway. > >I agree. No matter what you do, people (like myself above :) ) will point out >the "flaws" (I prefer to think of them as alternatives). > >Good luck with your tests! > >KarinsDad > >> >>bruce
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.