Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: SSDF is NOT Corrupted

Author: Ratko V Tomic

Date: 23:25:24 09/27/99

Go up one level in this thread


>> If
>>they picked 10 top programs (incl. e.g. Rebel 10, Hiarcs 7, not just 7.32
>>from CB) and distributed time on the fastest machines equally, you still
>>get the same overall info on the strength improvement on that hardware,
>>just covering the wider spectrum of programs, but playing the same total
>>number of games on the fast hardware. Nothing useful is gained by giving
>>all the fast hardware to the 4 CB programs, in effect deciding before the
>>cycle even started who will get the top 4 spots.
>
>My point is they won't publish entries unless 100 games have been played.  So
>maybe they could have played 60 games with 10 programs on 450s, and we wouldn't
>know squat about how much improvement to expect until the next list, because no
>450 results would have been published this time around.
>

The rules are either not right or they're being applied with a lack
of common sense. Suppose they did publish results with each of the
top 8-10 programs playing equal number of games as now, except that all
programs had equal average hardware. In that case each would have,
taking your example, 60 games instead of 100 on K2-450, the rest on
slower machines. Obvious drawback is that it makes the uncertainty of
K2-450 improvement for the 4 CB programs slightly larger (the uncertianty
increases as the 1/sqrt(N) as N drops, where N is number of games). But
in return it makes the certainty for other manufacturers' products
significantly greater (compared to the much greater guesswork in extrapolating
from the lower speeds). And more importantly, regarding the fairness of
the tests, it doesn't skew the list by willfully handing the top 4 spots
to one company before the competition even started. And finally, since the
total number of games played on 450's remains the same, while using
the larger sample of programs, it improves the estimate of the
average (across all programs) improvement on the fast hardware.

I can't see how anyone (but CB) could weigh more the single "con"
(in effect the absence of the preferred treatment of the CB programs)
against all the "pros" of the equal average hardware tests.

Although one might argue that some tests, warts and all, are still
better than no tests, one can also say that the objectivity illusion
and scientific aura it creates in the public mind about the relative
strength of the programs may drive some competitors out of business,
be it by making them appear worse in a scientifically sounding evaluation,
or by denying them exposure if they refuse to play against the stacked
deck (as some have done). Having the facts wrong may be worse than having
no facts. And having fewer competing manufacturers is certainly worse.




This page took 0.01 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.