Author: Joseph Ciarrochi
Date: 12:18:25 01/05/06
Go up one level in this thread
> >Statements like this come from a fundamental misunderstanding of the mathematics >involved. Thank you for your comments Dan. I should note that I have no fundamental misunderstanding here. I teach statistics at the university level. However, I do think it is good that you keep making the points you make. I should not toss "significantly" around, even if this is just a fun hobby cite. I suppose my main question is, "is there a difference between the CEGT and SSDF rating." To test this, you need to examine whether the difference between fruit and fritz in the CEGT rating list is smaller than the difference between fruit and fritz in the SSDF list (the complete agreement hypothesis you state below). This is a difference between difference test, not a direct test between means. I could answer this question with some time, but , well, this is a hobby site and i don't want it to look too much like what i do at work :) (though my statistician geek side is pulling me to do this test. argh) Generally, I want to avoid emails that look like the results section of my journal papers. I am definitely not casting aspertations at the SSDF cite. I'm just wondering, what are the key variables in which the cite differs? Anyway, what can I say. I think you do a nice job of explaining statistical error, and i hope you keep doing it :) best Joseph > >> The current list has fruit significantly better than fritz9, but the CEGT list >>has them as similar, and all my (admitadly informal) tests has them as equal. >>Maybe as the number of games keep coming in, we will see the gap between fruit >>and fritz decrease? > > THE SSDF RATING LIST 2006-01-03 1104075 games played by 274 computers > Rating + - Games Won Oppo > ------ --- --- ----- --- ---- > 1 Fruit 2.2.1 256MB Athlon 1200 MHz 2852 35 -33 457 68% 2717 > 2 Fritz 9.0 256MB Athlon 1200 MHz 2819 32 -30 587 74% 2639 > >2819 + 32 = 2851 >2852 - 33 = 2819 > >Within experimental certaintly, the SSDF list does not tell us which one of >these two programs is strongest. > >CEGT: >All versions, adapted to Shredder 9 with 2750 ELO ># Name bayeselo 0052.15 >(2005-09-29) ELOstat 1.3 Score Av. Op. >bayeselo Draws Games >ELO + - ELO + - >5 Fritz 9 2780 +14 -14 2768 +12 -12 63.8% 2674.3 30.0% 2236 >7 Fruit 2.2.1 2779 +16 -16 2772 +14 -14 65.5% 2663.7 33.0% 1601 > >2780 - 12 = 2768 >2779 + 14 = 2783 > >Within experimental certaintly, the CEGT list does not tell us which one of >these two programs is strongest. > >Given that the tests are under VERY different conditions (time control, books >used, etc.) I find it quite interesting that the two placements are in complete >agreement (Fritz 9 and Fruit 2.2.1 are of about the same strength).
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.