Author: martin fierz
Date: 15:28:22 06/21/04
Go up one level in this thread
On June 21, 2004 at 13:50:11, Gian-Carlo Pascutto wrote: >On June 21, 2004 at 10:30:33, martin fierz wrote: > >>On June 20, 2004 at 02:56:08, Sandro Necchi wrote: >> >>>There is a simple way to verify if the "authors" are correct or not. >>> >>>They should state clearly how to evaluate all the solutions of the tests >>>comparing the hardware to the SSDF one, in order to create the Elo figure. >>> >>>Then by choosing the next release of 5 commercial programs which will be tested >>>by SSDF they have to predict the Elo for ALL 5 chess programs with a + - of 10 >>>points. >>> >>>Than and indipendent tester should run the tests. >>> >>>If they fail, than they loose. >>> >>>Sandro >> >>+-10 elo, you must be kidding! >>the SSDF results themselves have larger error margins than that... > >Yes, but the ratinglists don't list errors and rank programs with smaller >differences than 10 ELO. that has nothing to do with this discussion. if the SSDF rating list, with a very computing-time-intensive testing methodology, produces ratings with typically +-30 error bars, you cannot expect a simple test suite to be any better. so you have to allow it a +-30 margin of error too, except if you want to claim that the test suite is better than the SSDF list, which i believe not even the most hardcore promoters of test suites would do. so now you have two numbers with error margins of +-30, which means that by error propagation their difference has a standard error of about 40 rating points (i.e. if you ran your own version of the SSDF list you would find rating differences up to 40 points between the two lists routinely). this shows that sandro's claim that the test suite should coincide with the SSDF by +-10 is ridiculous. i know i won't convince him, but i hope i can convince you ;-) cheers martin
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.