Author: Ratko V Tomic
Date: 23:25:24 09/27/99
Go up one level in this thread
>> If >>they picked 10 top programs (incl. e.g. Rebel 10, Hiarcs 7, not just 7.32 >>from CB) and distributed time on the fastest machines equally, you still >>get the same overall info on the strength improvement on that hardware, >>just covering the wider spectrum of programs, but playing the same total >>number of games on the fast hardware. Nothing useful is gained by giving >>all the fast hardware to the 4 CB programs, in effect deciding before the >>cycle even started who will get the top 4 spots. > >My point is they won't publish entries unless 100 games have been played. So >maybe they could have played 60 games with 10 programs on 450s, and we wouldn't >know squat about how much improvement to expect until the next list, because no >450 results would have been published this time around. > The rules are either not right or they're being applied with a lack of common sense. Suppose they did publish results with each of the top 8-10 programs playing equal number of games as now, except that all programs had equal average hardware. In that case each would have, taking your example, 60 games instead of 100 on K2-450, the rest on slower machines. Obvious drawback is that it makes the uncertainty of K2-450 improvement for the 4 CB programs slightly larger (the uncertianty increases as the 1/sqrt(N) as N drops, where N is number of games). But in return it makes the certainty for other manufacturers' products significantly greater (compared to the much greater guesswork in extrapolating from the lower speeds). And more importantly, regarding the fairness of the tests, it doesn't skew the list by willfully handing the top 4 spots to one company before the competition even started. And finally, since the total number of games played on 450's remains the same, while using the larger sample of programs, it improves the estimate of the average (across all programs) improvement on the fast hardware. I can't see how anyone (but CB) could weigh more the single "con" (in effect the absence of the preferred treatment of the CB programs) against all the "pros" of the equal average hardware tests. Although one might argue that some tests, warts and all, are still better than no tests, one can also say that the objectivity illusion and scientific aura it creates in the public mind about the relative strength of the programs may drive some competitors out of business, be it by making them appear worse in a scientifically sounding evaluation, or by denying them exposure if they refuse to play against the stacked deck (as some have done). Having the facts wrong may be worse than having no facts. And having fewer competing manufacturers is certainly worse.
This page took 0.01 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.