Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: SSDF is NOT Corrupted

Author: Dave Gomboc

Date: 16:00:27 09/27/99

Go up one level in this thread


On September 27, 1999 at 16:14:12, Ratko V Tomic wrote:

>> I think they have tested 4 ChessBase programs on K6-450 because
>> these programs were leading the P200MMX list, and they wanted to
>> know how much rating points a program would gain from this new
>> processor.
>
>If they had limited number of fast machines, hence limited number of
>games they could play, they certainly could have distributed these
>games evenly between the top ten or so programs, so that the final
>ratings reflects program strengths on equal average hardware. The
>way they did it is capricious at best. You can find equally well,
>if not better, how much K2-450 increases generally the strength
>of chess programs compared to P-200, by playing 100 games with
>10 top programs on that hardware (and the rest of games on the slower
>hardware) as you would by playing 250 games by 4 CB programs only.
>While the increase for those 4 wouldn't be as accurate in 10x100
>scenario, the general increase over the spectrum of different programs
>would be more accurate (since the spectrum tested is wider). So, in
>10x100 case one would at worst need some extrapolation to find out
>more accurately how would, say, Fritz 5.32 perform in 250 games vs
>Fritz 5.32 in 100 games on K2-450. Now, is that a greater extrapolation
>than one needs to make to figure out how would, say, Chess Tiger or
>Rebel etc perform on K2-450 (as one does with the current SSDF approach)?
>Surely not.
>
>By far the chief effect (and therefore the likely reason behind) of the
>approach chosen by the SSDF is to give the person(s) picking the progams
>for given hardware the flexibility to hand-pick which 4 programs will
>be on the top of the SSDF list (which was known as soon as s/he picked
>the 4 to run on K2-450). Now, why would one wish such amazing flexibility
>and why would one use it in that peculiar way?
>
>I think discovering a benevolent reason requires a bit greater stretch
>of common sense than the more natural reasons (as are prevalent in
>any other product evaluation process which has effect on product
>sales). Well, if this were the first and isolated SSDF/CB controversy,
>one might go for the stretch.

Sure, they could have tested 10 programs on K6-2-450s instead of 4, but would
they have reached 100 games with each by this rating list?  Better to have some
idea of how programs are doing on 450s than none.  We all have eyes and can see
that there are only four programs tested with 450s when we look at the list.
This is precisely why the SSDF insists that people quote its information in its
entirety (that is, including hardware used, games played, average rating of
opponents, and percentile score.)

Dave



This page took 0.01 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.