Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Comments of latest SSDF list 2

Author: Tina Long

Date: 18:22:25 05/26/02

Go up one level in this thread


On May 26, 2002 at 17:30:59, Rolf Tueschen wrote:

>On May 26, 2002 at 08:47:38, Tina Long wrote:
>
>>On May 26, 2002 at 08:13:08, Rolf Tueschen wrote:
>>
>>>I would not support this. Many aspects are flawed. What is large enough?
>>
>>At least 12 opponents at 40 games/match to give a +-40ish deviation is large
>>enough to provide the information I derive from the SSDF list.
>>
>>>You
>>>won't think that 40 is large enough?!
>>
>>40,000 is better, but 40 per match will do, as that is 1000 times quicker.
>>
>
>I have some strange findings out of the recent SSDF list. I quote:
>
>11 Gandalf 5.1  256MB Athlon 1200 MHz, 2646
>GT2.0 A1200     13.5-26.5  DpFritz A1200   13.5-21.5  Shredd6 A1200    1.5-5.5
>Shre532 A1200     15-23    DpFritz K6450     22-22    CT14 CB K6450     19-14
>Craf18. A1200     22-18    Junior6 K6450     30-13    Shred5 K6-450     52-28
>Frit532 K6450     27-17    Junior5 K6450   31.5-12.5  Hiar732 K6450     29-19
>SOS  K6-2 450    3.5-1.5   Goliath K6450     32-22    Nimzo99 K6450   29.5-10.5
>
>Tina, would you still be pleased with such 4 (four!) or 6 (six!) "matches" in
>the SSDF?

Um, that's 5 (five!) and 7 (seven!).   (now lecture me on statistics)

Yes I'm pleased those results are included.  Those matches will be finished by
next list.
The effect of the 5 games so far in Gan-SOS, on their total ratings will be
small.


>What is the reason for such strange matches? Do you still feel that
>you should be thankful that SSDF gives you the results

Yes, I see no reason not to be.

> and how would you make
>your own estimation on the basis of such short matches?

Individual match results mean little, and of course the 5 games Gan-SOS is only
just started.  But that doesn't mean they should be left out of the
calculations.

The accumulation of All the games against MANY opponents gives a rating.
It is infeasable to test only against similar strength opponents as ther SAMPLE
SIZE of opponents is too small.
>
>Please note, that this here is just what I found by chance in Thoralf Karlsson's
>own posting someone later quoted into this thread.
>
>Someone here asked if I wanted to imply cheating and I aswered "No!", but

"No!", but

From here on you are getting very biased and emotional Rolf, and I know better
than argue against you in that state.

Tina Long


> could
>you explain why Gandalf had 54 games against Goliath? BTW Goliath on weaker
>hardware! Oops, Gandalf had 80 games against Shredder 5, also on weaker
>hardware. In short: Do you agree that _not_ the later 5% bogus is so important
>but much more such deliberate differences, say the quantity of the games in a
>match and the different hardware?
>
>I would still reject the possibility of cheating but I know for sure, if _I_
>wanted to cheat, it would be easy to succeed if I were allowed to play matches
>between 5 (!) and 80 games, I can guarantee you this for sure. No matter the
>size of the margin of error...
>
>This practice is happening in SSDF since at least 1996 when I asked the same
>questions and Peter answered me the following, I recall by heart:
>
>...such differences are completely uninteresting, simply because we have many
>games with a program, so that such differences have no influence...
>
>At the time I criticized such a practice and opposed the logic too that because
>of some hundred of games overall such little extremes had no meaning. Exactly
>here, I can say now, we have the basic fallacy in the whole SSDF practice of
>testing. What they deal with are mere numbers, no matter how they got them.
>Whether on different hardware, different quantity of games and many more
>uncontrolled and statistically unallowed behaviour.
>
>And it should be clear to the reader that the SSDF should change the wrong
>tradition. Numbers are mathematically the same, but already in stats there are
>numbers with a better status and a worse. In the end, you'd never know how your
>Elo for a specific program  was summed up. With blanks or good data.
>
>Rolf Tueschen



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.