Computer Chess Club Archives

Search

Terms

Messages

Subject: Re: SSDF Rating List 2006-01-03 - no longer acceptable !

Author: Joseph Ciarrochi

Date: 12:18:25 01/05/06


>
>Statements like this come from a fundamental misunderstanding of the mathematics
>involved.



Thank you for your comments Dan. I should note that I have no fundamental
misunderstanding here. I teach statistics at the university level. However, I do
think it is good that you keep making the points you make. I should not toss
"significantly" around, even if this is just a fun hobby  cite.

I suppose my main question is, "is there a difference between the CEGT and SSDF
rating." To test this, you need to examine whether the difference between fruit
and fritz in the CEGT rating list is smaller than the difference between fruit
and fritz in the SSDF list (the complete agreement hypothesis you state below).
This is a difference between difference test, not a direct test between means. I
could answer this question with some time, but , well, this is a hobby site and
i don't want it to look too much like what i do at work :)  (though my
statistician geek side is pulling me to do this test. argh)


Generally, I want to avoid emails that look like the results section of my
journal papers. I am definitely not casting aspertations at the SSDF cite. I'm
just wondering, what are the key variables in which the cite differs?

Anyway, what can I say. I think you do a nice job of explaining statistical
error, and i hope you keep doing it :)

best
Joseph





>
>> The current list has fruit significantly better than fritz9, but the CEGT list
>>has them as similar, and all my (admitadly informal) tests has them as equal.
>>Maybe as the number of games keep coming in, we will see the gap between fruit
>>and fritz decrease?
>
>      THE SSDF RATING LIST 2006-01-03   1104075 games played by  274 computers
>                                           Rating   +     -  Games   Won  Oppo
>                                           ------  ---   --- -----   ---  ----
>   1 Fruit 2.2.1  256MB Athlon 1200 MHz      2852   35   -33   457   68%  2717
>   2 Fritz 9.0  256MB Athlon 1200 MHz        2819   32   -30   587   74%  2639
>
>2819 + 32 = 2851
>2852 - 33 = 2819
>
>Within experimental certaintly, the SSDF list does not tell us which one of
>these two programs is strongest.
>
>CEGT:
>All versions, adapted to Shredder 9 with 2750 ELO
># Name bayeselo 0052.15
>(2005-09-29) ELOstat 1.3 Score Av. Op.
>bayeselo Draws Games
>ELO + - ELO + -
>5 Fritz 9 2780 +14 -14 2768 +12 -12 63.8% 2674.3 30.0% 2236
>7 Fruit 2.2.1 2779 +16 -16 2772 +14 -14 65.5% 2663.7 33.0% 1601
>
>2780 - 12 = 2768
>2779 + 14 = 2783
>
>Within experimental certaintly, the CEGT list does not tell us which one of
>these two programs is strongest.
>
>Given that the tests are under VERY different conditions (time control, books
>used, etc.) I find it quite interesting that the two placements are in complete
>agreement (Fritz 9 and Fruit 2.2.1 are of about the same strength).

Re: SSDF Rating List 2006-01-03 - no longer acceptable ! Dann Corbit 12:38:01 01/05/06
- Re: SSDF Rating List 2006-01-03 - no longer acceptable ! Joseph Ciarrochi 12:49:03 01/05/06

This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.