Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Comments of latest SSDF list 2

Author: Rolf Tueschen

Date: 14:30:59 05/26/02

Go up one level in this thread


On May 26, 2002 at 08:47:38, Tina Long wrote:

>On May 26, 2002 at 08:13:08, Rolf Tueschen wrote:
>
>>I would not support this. Many aspects are flawed. What is large enough?
>
>At least 12 opponents at 40 games/match to give a +-40ish deviation is large
>enough to provide the information I derive from the SSDF list.
>
>>You
>>won't think that 40 is large enough?!
>
>40,000 is better, but 40 per match will do, as that is 1000 times quicker.
>

I have some strange findings out of the recent SSDF list. I quote:

11 Gandalf 5.1  256MB Athlon 1200 MHz, 2646
GT2.0 A1200     13.5-26.5  DpFritz A1200   13.5-21.5  Shredd6 A1200    1.5-5.5
Shre532 A1200     15-23    DpFritz K6450     22-22    CT14 CB K6450     19-14
Craf18. A1200     22-18    Junior6 K6450     30-13    Shred5 K6-450     52-28
Frit532 K6450     27-17    Junior5 K6450   31.5-12.5  Hiar732 K6450     29-19
SOS  K6-2 450    3.5-1.5   Goliath K6450     32-22    Nimzo99 K6450   29.5-10.5

Tina, would you still be pleased with such 4 (four!) or 6 (six!) "matches" in
the SSDF? What is the reason for such strange matches? Do you still feel that
you should be thankful that SSDF gives you the results and how would you make
your own estimation on the basis of such short matches?

Please note, that this here is just what I found by chance in Thoralf Karlsson's
own posting someone later quoted into this thread.

Someone here asked if I wanted to imply cheating and I aswered "No!", but could
you explain why Gandalf had 54 games against Goliath? BTW Goliath on weaker
hardware! Oops, Gandalf had 80 games against Shredder 5, also on weaker
hardware. In short: Do you agree that _not_ the later 5% bogus is so important
but much more such deliberate differences, say the quantity of the games in a
match and the different hardware?

I would still reject the possibility of cheating but I know for sure, if _I_
wanted to cheat, it would be easy to succeed if I were allowed to play matches
between 5 (!) and 80 games, I can guarantee you this for sure. No matter the
size of the margin of error...

This practice is happening in SSDF since at least 1996 when I asked the same
questions and Peter answered me the following, I recall by heart:

...such differences are completely uninteresting, simply because we have many
games with a program, so that such differences have no influence...

At the time I criticized such a practice and opposed the logic too that because
of some hundred of games overall such little extremes had no meaning. Exactly
here, I can say now, we have the basic fallacy in the whole SSDF practice of
testing. What they deal with are mere numbers, no matter how they got them.
Whether on different hardware, different quantity of games and many more
uncontrolled and statistically unallowed behaviour.

And it should be clear to the reader that the SSDF should change the wrong
tradition. Numbers are mathematically the same, but already in stats there are
numbers with a better status and a worse. In the end, you'd never know how your
Elo for a specific program  was summed up. With blanks or good data.

Rolf Tueschen



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.