Computer Chess Club Archives

Search

Terms

Messages

Subject: Re: but hey , don't take my word for it...

Author: Uri Blass

Date: 08:51:48 01/20/06

On January 20, 2006 at 05:28:47, Rolf Tueschen wrote:

>On January 20, 2006 at 04:58:11, enrico carrisco wrote:
>
>>On January 20, 2006 at 03:14:09, Mike Byrne wrote:
>>
>>>http://www.chessolympiad-torino2006.org/eng/index.php?cav=1&dettaglio=309
>>>
>>>good stuff...
>>
>>Yea -- he even cited the "Anti-computer chess expert" Pablo Ignacio Restrepo.
>>What more would we need?
>>
>>-elc.
>
> Yes, this, and then also the point that not automatically everything which is
>quoted by a GM, here GM Golubev, is similar to Newton's Gravitation Law Paper or
>Einstein's paper on Relativity. It's a bogus more or less. I want to add a
>single item so that my opinion doesnt look like a cheap arbitrariness.
>
>The CEGT test guys are mentioned (I think some 15 persons) and it sounds as if
>they were a sort of institution for certain questions in CC. Comparable to what
>we meant when we spoke of "the new SSDF list" in the 90's. The problem begins if
>I question that Rybka is already proven the strongest engine today. Then people
>tell me to look at CEGT where that has been proven... This was a few days ago
>here in CCC. I must object to such sort of hybris. The truth is that we dont
>have statistical methods for making such claims. Even after 700 or maybe over
>1000 games the significance is not so sure and if you look at the +/- boundaries
>of the so called Elo results then you still have overlappings and you cant say
>that Rybka is the clear first. - Nothing against the testers of CEGT. The
>presentation of the results is nice. The games download is also well organised.
>But all that can't hide the fact that we have certain statistical requirements
>which must be respected if one wanted to make clear statements. We are all too
>human. In a world of huge uncertainties and big problems overall, we feel the
>need to do something for our wellness in such a hobby. Where if not there could
>we find our peace of mind? We can test. We can create a whole network of
>testers. But if we then want to make clear statements, alas, we are all standing
>under the steel hard laws of stats. And basically we cant get what we want to
>have. We are bound to believe in our private preferences. We can also assume
>that actually, for a short time, Rybka is "certainly" looking like a very strong
>engine. But everything above that would be bogus. We should all keep that in
>mind. The development in CC is always moving. THere is no such thing as the best
>alltime engine for the next 10 years. If I would get the newest super computers
>of the US military, it could well be that I become the next World Champion with
>Gullydeckel, to give an absurd example, or with my personal shooting star The
>Roaring Thunder which was developed in my kitchen for the next WCCC in Torino...
>I degress a little bit.

Here are the CEGT single processor results

I ignore single processor result

You can see that single processor programs have less than 2800 when even the 32
bit version of rybka has bigger rating than 2815 when the top 64 bit version
even has more than 2850.

No over lapping

1 Rybka 1.01 Beta 9 64-bit opt 2921 73 68 71 80.3 % 2677 33.8 %
2 Rybka 1.0 Beta 64-bit 2859 21 21 765 68.4 % 2725 32.7 %
4 Rybka 1.0 Beta 32-bit 2825 10 10 3575 68.9 % 2687 31.0 %
6 Fruit 2.2.1 2786 8 8 5035 66.0 % 2671 33.1 %
7 Fritz 9 2782 11 11 2724 62.8 % 2691 30.2 %
9 TogaII 1.1a 2772 14 14 1560 60.3 % 2699 36.3 %
10 Hiarcs 10 Hypermodern 2771 22 22 644 53.3 % 2749 35.7 %

The only entry of CEGT that in theory can have more than 2800 on one cpu is deep
fritz8 but deep fritz8 2 cpu has less than 2800 and it is illogical to expect
deep fritz8 on one cpu more than it

8 Deep Fritz 8 2CPU 512MB 2772 14 14
15 Deep Fritz 8 1CPU 2754 107 104

The fact that in part of the other lists rybka is number 1 without an advantage
that is significant enough probably also increase the certainty that rybka is
the best engine because the probability of something that is not the best to get
first place in every serious list is very small.

It is also wrong to combine errors of 2 programs because the error of the
difference is smaller than the sum of the errors.

If you have 50 elo error in 2 list then the error of the difference is 70-71
elo(50*sqrt(2)) and not 100 elo.

basically if the errors are a and b then I think that you can use square root of
the sum of a^2+b^2 for the error of the difference.

Uri

The Validity of CC Testresults - Take my Word for that one! Rolf Tueschen 10:41:20 01/20/06
- Re: The Validity of CC Testresults - Take my Word for that one! Uri Blass 12:05:50 01/20/06
  - Re: The Validity of CC Testresults - Take my Word for that one! Rolf Tueschen 13:21:11 01/20/06
- Re: The Validity of CC Testresults - Take my Word for that one! Günther Simon 10:49:10 01/20/06
  - Re: The Validity of CC Testresults - Take my Word for that one! Rolf Tueschen 10:56:49 01/20/06
    - Re: The Validity of CC Testresults - Take my Word for that one! Günther Simon 11:22:06 01/20/06
      - Re: The Validity of CC Testresults - Take my Word for that one! Rolf Tueschen 13:23:38 01/20/06
        
        Re: The Validity of CC Testresults - Take my Word for that one! Günther Simon 13:36:54 01/20/06
        
        Re: The Validity of CC Testresults - Take my Word for that one! Rolf Tueschen 13:43:23 01/20/06

This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.