Author: Rolf Tueschen
Date: 14:30:59 05/26/02
Go up one level in this thread
On May 26, 2002 at 08:47:38, Tina Long wrote: >On May 26, 2002 at 08:13:08, Rolf Tueschen wrote: > >>I would not support this. Many aspects are flawed. What is large enough? > >At least 12 opponents at 40 games/match to give a +-40ish deviation is large >enough to provide the information I derive from the SSDF list. > >>You >>won't think that 40 is large enough?! > >40,000 is better, but 40 per match will do, as that is 1000 times quicker. > I have some strange findings out of the recent SSDF list. I quote: 11 Gandalf 5.1 256MB Athlon 1200 MHz, 2646 GT2.0 A1200 13.5-26.5 DpFritz A1200 13.5-21.5 Shredd6 A1200 1.5-5.5 Shre532 A1200 15-23 DpFritz K6450 22-22 CT14 CB K6450 19-14 Craf18. A1200 22-18 Junior6 K6450 30-13 Shred5 K6-450 52-28 Frit532 K6450 27-17 Junior5 K6450 31.5-12.5 Hiar732 K6450 29-19 SOS K6-2 450 3.5-1.5 Goliath K6450 32-22 Nimzo99 K6450 29.5-10.5 Tina, would you still be pleased with such 4 (four!) or 6 (six!) "matches" in the SSDF? What is the reason for such strange matches? Do you still feel that you should be thankful that SSDF gives you the results and how would you make your own estimation on the basis of such short matches? Please note, that this here is just what I found by chance in Thoralf Karlsson's own posting someone later quoted into this thread. Someone here asked if I wanted to imply cheating and I aswered "No!", but could you explain why Gandalf had 54 games against Goliath? BTW Goliath on weaker hardware! Oops, Gandalf had 80 games against Shredder 5, also on weaker hardware. In short: Do you agree that _not_ the later 5% bogus is so important but much more such deliberate differences, say the quantity of the games in a match and the different hardware? I would still reject the possibility of cheating but I know for sure, if _I_ wanted to cheat, it would be easy to succeed if I were allowed to play matches between 5 (!) and 80 games, I can guarantee you this for sure. No matter the size of the margin of error... This practice is happening in SSDF since at least 1996 when I asked the same questions and Peter answered me the following, I recall by heart: ...such differences are completely uninteresting, simply because we have many games with a program, so that such differences have no influence... At the time I criticized such a practice and opposed the logic too that because of some hundred of games overall such little extremes had no meaning. Exactly here, I can say now, we have the basic fallacy in the whole SSDF practice of testing. What they deal with are mere numbers, no matter how they got them. Whether on different hardware, different quantity of games and many more uncontrolled and statistically unallowed behaviour. And it should be clear to the reader that the SSDF should change the wrong tradition. Numbers are mathematically the same, but already in stats there are numbers with a better status and a worse. In the end, you'd never know how your Elo for a specific program was summed up. With blanks or good data. Rolf Tueschen
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.