Computer Chess Club Archives

Search

Terms
Messages

Subject: Re: The Validity of CC Testresults - Take my Word for that one!

Author: Günther Simon
Date: 11:22:06 01/20/06
On January 20, 2006 at 13:56:49, Rolf Tueschen wrote:

>On January 20, 2006 at 13:49:10, Günther Simon wrote:
>
>>On January 20, 2006 at 13:41:20, Rolf Tueschen wrote:
>>
>>>On January 20, 2006 at 11:51:48, Uri Blass wrote:
>>>
>>>>On January 20, 2006 at 05:28:47, Rolf Tueschen wrote:
>>>>
>>>>>On January 20, 2006 at 04:58:11, enrico carrisco wrote:
>>>>>
>>>>>>On January 20, 2006 at 03:14:09, Mike Byrne wrote:
>>>>>>
>>>>>>>http://www.chessolympiad-torino2006.org/eng/index.php?cav=1&dettaglio=309
>>>>>>>
>>>>>>>good stuff...
>>>>>>
>>>>>>Yea -- he even cited the "Anti-computer chess expert" Pablo Ignacio Restrepo.
>>>>>>What more would we need?
>>>>>>
>>>>>>-elc.
>>>>>
>>>>> Yes, this, and then also the point that not automatically everything which is
>>>>>quoted by a GM, here GM Golubev, is similar to Newton's Gravitation Law Paper or
>>>>>Einstein's paper on Relativity. It's a bogus more or less. I want to add a
>>>>>single item so that my opinion doesnt look like a cheap arbitrariness.
>>>>>
>>>>>The CEGT test guys are mentioned (I think some 15 persons) and it sounds as if
>>>>>they were a sort of institution for certain questions in CC. Comparable to what
>>>>>we meant when we spoke of "the new SSDF list" in the 90's. The problem begins if
>>>>>I question that Rybka is already proven the strongest engine today. Then people
>>>>>tell me to look at CEGT where that has been proven... This was a few days ago
>>>>>here in CCC. I must object to such sort of hybris. The truth is that we dont
>>>>>have statistical methods for making such claims. Even after 700 or maybe over
>>>>>1000 games the significance is not so sure and if you look at the +/- boundaries
>>>>>of the so called Elo results then you still have overlappings and you cant say
>>>>>that Rybka is the clear first. - Nothing against the testers of CEGT. The
>>>>>presentation of the results is nice. The games download is also well organised.
>>>>>But all that can't hide the fact that we have certain statistical requirements
>>>>>which must be respected if one wanted to make clear statements. We are all too
>>>>>human. In a world of huge uncertainties and big problems overall, we feel the
>>>>>need to do something for our wellness in such a hobby. Where if not there could
>>>>>we find our peace of mind? We can test. We can create a whole network of
>>>>>testers. But if we then want to make clear statements, alas, we are all standing
>>>>>under the steel hard laws of stats. And basically we cant get what we want to
>>>>>have. We are bound to believe in our private preferences. We can also assume
>>>>>that actually, for a short time, Rybka is "certainly" looking like a very strong
>>>>>engine. But everything above that would be bogus. We should all keep that in
>>>>>mind. The development in CC is always moving. THere is no such thing as the best
>>>>>alltime engine for the next 10 years. If I would get the newest super computers
>>>>>of the US military, it could well be that I become the next World Champion with
>>>>>Gullydeckel, to give an absurd example, or with my personal shooting star The
>>>>>Roaring Thunder which was developed in my kitchen for the next WCCC in Torino...
>>>>>I degress a little bit.
>>>>
>>>>Here are the CEGT single processor results
>>>>
>>>>I ignore single processor result
>>>
>>>It striked me with a sort of importunateness when I read today the campaign by
>>>Simon/Pittlik? and Lagershausen and when I read your lecture here, dear Uri, I'm
>>>quite sure that it's impossible to tell people the complex truth, if they are
>>>used to believe in simple truths. I have learned long enough how careful one
>>>should be in statistics. Honestly Uri, what you are doing here is unallowed. You
>>>cant take a list with results and then simply remove certain entries and THEN
>>>compare with their results included. That is your first crass mistake. Of course
>>>also I do know that you cant simply compare 1-processor with 2-processor progs.
>>>And that wasnt at all what I was trying to do.
>>>
>>>
>>>>
>>>>You can see that single processor programs have less than 2800 when even the 32
>>>>bit version of rybka has bigger rating than 2815 when the top 64 bit version
>>>>even has more than 2850.
>>>>
>>>>No over lapping
>>>>
>>>>1 Rybka 1.01 Beta 9 64-bit opt 2921 73 68 71 80.3 % 2677 33.8 %
>>>>2 Rybka 1.0 Beta 64-bit 2859 21 21 765 68.4 % 2725 32.7 %
>>>>4 Rybka 1.0 Beta 32-bit 2825 10 10 3575 68.9 % 2687 31.0 %
>>>>6 Fruit 2.2.1 2786 8 8 5035 66.0 % 2671 33.1 %
>>>>7 Fritz 9 2782 11 11 2724 62.8 % 2691 30.2 %
>>>>9 TogaII 1.1a 2772 14 14 1560 60.3 % 2699 36.3 %
>>>>10 Hiarcs 10 Hypermodern 2771 22 22 644 53.3 % 2749 35.7 %
>>>>
>>>>The only entry of CEGT that in theory can have more than 2800 on one cpu is deep
>>>>fritz8 but deep fritz8 2 cpu has less than 2800 and it is illogical to expect
>>>>deep fritz8 on one cpu more than it
>>>>
>>>>8 Deep Fritz 8 2CPU 512MB 2772 14 14
>>>>15 Deep Fritz 8 1CPU 2754 107 104
>>>>
>>>>The fact that in part of the other lists rybka is number 1 without an advantage
>>>>that is significant enough probably also increase the certainty that rybka is
>>>>the best engine because the probability of something that is not the best to get
>>>>first place in every serious list is very small.
>>>>
>>>
>>>
>>>Let's come here to the second crass mistake in your arguments. You see the
>>>result of first place for Rybka like I do that and you conclude that this must
>>>have a proof signal as such. That is the mistake already. Because you conclude
>>>that place one means best strength as such. NB that with stats you measure and
>>>then you claim that your measurement has a validity. Because you kept everything
>>>of importance under control. I simply object that this is wrong for the actual
>>>situation because - as I have already debated with Bob Hyatt - Rybka is in the
>>>initiative actually while all others must react now or tomorrow. But what the
>>>results show is the improments of Rybka against unchanged older progs. And I
>>>claim, without great risks, that any strong program will get in advantage, if
>>>the others couldnt react yet.
>>
>>Rating lists don't show ratings of the future versions. I doubt Bob discussed
>>astrology with you. The thread is about today not about future strength,
>>no idea why you changed the topic. Ah wait I know why you changed it ;)
>>
>>Guenther
>
>
>Just relax please. I dont speak of the future. I speak of the factor you didnt
>reflect and couldnt control with the actual testing. Never heard about the
>existing advantage of a new entry? This is not about rocket science, you could
>well follow the debate if you could forget for a moment that you wanted to flame
>me... just give truth a chance. I'm wrong often enough, then you can jump on me,
>but this here is so trivial that you lose the debate big time.

Your little earth hole gets smaller and smaller - big time ;-)
Computerchess rating lists also don't measure 'new entry psychology'.
Programs don't care for your psychology...
Every new entry would have been number 1, if it had any significant
influence, which is wrong, no CCC science needed.
Have fun to work out a 'new entry' formula together with your
rating program.
Re: The Validity of CC Testresults - Take my Word for that one! Rolf Tueschen 13:23:38 01/20/06
- Re: The Validity of CC Testresults - Take my Word for that one! Günther Simon 13:36:54 01/20/06
  - Re: The Validity of CC Testresults - Take my Word for that one! Rolf Tueschen 13:43:23 01/20/06
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.