Author: Dave Gomboc
Date: 01:00:00 09/28/99
Go up one level in this thread
On September 28, 1999 at 02:25:24, Ratko V Tomic wrote: >>> If >>>they picked 10 top programs (incl. e.g. Rebel 10, Hiarcs 7, not just 7.32 >>>from CB) and distributed time on the fastest machines equally, you still >>>get the same overall info on the strength improvement on that hardware, >>>just covering the wider spectrum of programs, but playing the same total >>>number of games on the fast hardware. Nothing useful is gained by giving >>>all the fast hardware to the 4 CB programs, in effect deciding before the >>>cycle even started who will get the top 4 spots. >> >>My point is they won't publish entries unless 100 games have been played. So >>maybe they could have played 60 games with 10 programs on 450s, and we wouldn't >>know squat about how much improvement to expect until the next list, because no >>450 results would have been published this time around. >> > >The rules are either not right or they're being applied with a lack >of common sense. Suppose they did publish results with each of the >top 8-10 programs playing equal number of games as now, except that all >programs had equal average hardware. In that case each would have, >taking your example, 60 games instead of 100 on K2-450, the rest on >slower machines. Obvious drawback is that it makes the uncertainty of >K2-450 improvement for the 4 CB programs slightly larger (the uncertianty >increases as the 1/sqrt(N) as N drops, where N is number of games). But >in return it makes the certainty for other manufacturers' products >significantly greater (compared to the much greater guesswork in extrapolating >from the lower speeds). And more importantly, regarding the fairness of >the tests, it doesn't skew the list by willfully handing the top 4 spots >to one company before the competition even started. And finally, since the >total number of games played on 450's remains the same, while using >the larger sample of programs, it improves the estimate of the >average (across all programs) improvement on the fast hardware. > >I can't see how anyone (but CB) could weigh more the single "con" >(in effect the absence of the preferred treatment of the CB programs) >against all the "pros" of the equal average hardware tests. > >Although one might argue that some tests, warts and all, are still >better than no tests, one can also say that the objectivity illusion >and scientific aura it creates in the public mind about the relative >strength of the programs may drive some competitors out of business, >be it by making them appear worse in a scientifically sounding evaluation, >or by denying them exposure if they refuse to play against the stacked >deck (as some have done). Having the facts wrong may be worse than having >no facts. And having fewer competing manufacturers is certainly worse. While it may seem an arbitrary measure, they use 100 games as a cutoff because they believe that results based on less games are simply unreliable. As far as I know, they have followed this convention for the entire time that the list has existed. Certainly, it has been the rule since I first heard about the list, years ago. I don't expect them to (indeed, I expect them not to!) change their responsible reporting habits. The onus is on businesses to also report SSDF findings accurately. A heavy burden, I know, but some companies have done this well in the past, so it isn't impossible. It bothers me that people threaten the SSDF with lawsuits if they publish games or results of program vs. program testing. Would it be good if the SSDF ought to play tit-for-tat, and threatened lawsuits if the results they publish are not reported properly? It may seem like the current list is terribly pro-CB, but I think they simply took the four best programs they had on 200MMXs, and started with them. Sure, Hiarcs 7 (DOS) and Hiarcs 7.32 (CB/Win) are really close to each other, but I think there's an obvious preference for Windows applications in the buying public. Perhaps H7.32 was slightly higher than H7 on 200MMX machines anyway. I remember beta-testing Rebel 8 for Schroder BV: Ed had a contest for 10 people on the internet, so I wrote that hi, I was a student, I'm into computers, I like chess a lot, so because of these I am interested in computer chess, and that I would be interested in doing this. And somehow I was one of the lucky people, thanks to Ed and his team, and perhaps this happening is one of the reasons I am still around. I wrote a review (preview, I suppose) of Rebel 8 (which perhaps is still online at his site, or perhaps not) at the time. In it, I said that I would be surprised if Rebel did not debut at the top of the list. This turned out to be a big understatement: the first time it was on the list, Rebel 8 had a crushing lead over every other program. It's just the way it was. I think that the SSDF is specifically interested in the fair testing of different chess software packages. This is why they exist. It was an important counterpoint to the incredible (incredibly bullshit!) claims made by hardware manufacturers in the past. It is understandable that we do not always agree with some decisions that are made with respect to how they conduct their hobby. However, it is their organization, their work and effort, and I think that they are best placed to understand what information they want to know. Accusing them of selling out to corporate interests, without better than (IMO flimsy) circumstantial evidence, is not going to convince me that it is true. Dave
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.