Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: SSDF Rating List 2006-01-03 - no longer acceptable !

Author: Dann Corbit

Date: 12:38:01 01/05/06

Go up one level in this thread


On January 05, 2006 at 15:18:25, Joseph Ciarrochi wrote:

>
>>
>>Statements like this come from a fundamental misunderstanding of the mathematics
>>involved.
>
>
>
>Thank you for your comments Dan. I should note that I have no fundamental
>misunderstanding here. I teach statistics at the university level. However, I do
>think it is good that you keep making the points you make. I should not toss
>"significantly" around, even if this is just a fun hobby  cite.
>
>I suppose my main question is, "is there a difference between the CEGT and SSDF
>rating." To test this, you need to examine whether the difference between fruit
>and fritz in the CEGT rating list is smaller than the difference between fruit
>and fritz in the SSDF list (the complete agreement hypothesis you state below).
>This is a difference between difference test, not a direct test between means. I
>could answer this question with some time, but , well, this is a hobby site and
>i don't want it to look too much like what i do at work :)  (though my
>statistician geek side is pulling me to do this test. argh)

I would be interested in the mathematics.  My major was Numerical Analysis, so
you may even have me at a disadvantage here.

My interpretation of both lists is:
"Fritz 9 and Fruit 2.2.1 are of the same strength, within experimental
certainty."

Given that the experiments test different things (CEGT is at much faster time
control and uses standardized books, SSDF is at slower time control and uses own
books) I do not think we should expect agreement (IOW, agreement or disagreement
of the measurements would be equally unsurprising).

I think it would be a mistake to test every program against the same opponents,
unless you did a complete round-robin (with at least two games so color bias is
removed), which I think would be so tedious that nobody could concevably attempt
it.  Just the setup time would be mind boggling.

>Generally, I want to avoid emails that look like the results section of my
>journal papers. I am definitely not casting aspertations at the SSDF cite. I'm
>just wondering, what are the key variables in which the cite differs?
>
>Anyway, what can I say. I think you do a nice job of explaining statistical
>error, and i hope you keep doing it :)
>
>best
>Joseph
>
>
>
>
>
>>
>>> The current list has fruit significantly better than fritz9, but the CEGT list
>>>has them as similar, and all my (admitadly informal) tests has them as equal.
>>>Maybe as the number of games keep coming in, we will see the gap between fruit
>>>and fritz decrease?
>>
>>      THE SSDF RATING LIST 2006-01-03   1104075 games played by  274 computers
>>                                           Rating   +     -  Games   Won  Oppo
>>                                           ------  ---   --- -----   ---  ----
>>   1 Fruit 2.2.1  256MB Athlon 1200 MHz      2852   35   -33   457   68%  2717
>>   2 Fritz 9.0  256MB Athlon 1200 MHz        2819   32   -30   587   74%  2639
>>
>>2819 + 32 = 2851
>>2852 - 33 = 2819
>>
>>Within experimental certaintly, the SSDF list does not tell us which one of
>>these two programs is strongest.
>>
>>CEGT:
>>All versions, adapted to Shredder 9 with 2750 ELO
>># Name bayeselo 0052.15
>>(2005-09-29) ELOstat 1.3 Score Av. Op.
>>bayeselo Draws Games
>>ELO + - ELO + -
>>5 Fritz 9 2780 +14 -14 2768 +12 -12 63.8% 2674.3 30.0% 2236
>>7 Fruit 2.2.1 2779 +16 -16 2772 +14 -14 65.5% 2663.7 33.0% 1601
>>
>>2780 - 12 = 2768
>>2779 + 14 = 2783
>>
>>Within experimental certaintly, the CEGT list does not tell us which one of
>>these two programs is strongest.
>>
>>Given that the tests are under VERY different conditions (time control, books
>>used, etc.) I find it quite interesting that the two placements are in complete
>>agreement (Fritz 9 and Fruit 2.2.1 are of about the same strength).



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.