Author: Uri Blass
Date: 14:00:21 06/23/04
Go up one level in this thread
On June 23, 2004 at 15:47:24, Sandro Necchi wrote: >On June 23, 2004 at 08:30:50, Uri Blass wrote: > >>On June 22, 2004 at 13:24:33, Sandro Necchi wrote: >> >>>On June 21, 2004 at 18:28:22, martin fierz wrote: >>> >>>>On June 21, 2004 at 13:50:11, Gian-Carlo Pascutto wrote: >>>> >>>>>On June 21, 2004 at 10:30:33, martin fierz wrote: >>>>> >>>>>>On June 20, 2004 at 02:56:08, Sandro Necchi wrote: >>>>>> >>>>>>>There is a simple way to verify if the "authors" are correct or not. >>>>>>> >>>>>>>They should state clearly how to evaluate all the solutions of the tests >>>>>>>comparing the hardware to the SSDF one, in order to create the Elo figure. >>>>>>> >>>>>>>Then by choosing the next release of 5 commercial programs which will be tested >>>>>>>by SSDF they have to predict the Elo for ALL 5 chess programs with a + - of 10 >>>>>>>points. >>>>>>> >>>>>>>Than and indipendent tester should run the tests. >>>>>>> >>>>>>>If they fail, than they loose. >>>>>>> >>>>>>>Sandro >>>>>> >>>>>>+-10 elo, you must be kidding! >>>>>>the SSDF results themselves have larger error margins than that... >>>>> >>>>>Yes, but the ratinglists don't list errors and rank programs with smaller >>>>>differences than 10 ELO. >>> >>>Hi Martin, >>> >>>> >>>>that has nothing to do with this discussion. if the SSDF rating list, with a >>>>very computing-time-intensive testing methodology, produces ratings with >>>>typically +-30 error bars, you cannot expect a simple test suite to be any >>>>better. so you have to allow it a +-30 margin of error too, except if you want >>>>to claim that the test suite is better than the SSDF list, which i believe not >>>>even the most hardcore promoters of test suites would do. >>> >>>This is not fully correct because the more games you play in the SSDF list and >>>the error margin decrease, however if you take a look after the first Elo is >>>achived 95% of the programs, if not more, do not change the Elo by a high margin >>>+- 10 points so if it is true what the authors state that these test set are >>>able to estimate the program strenght is correct than they should be able to >>>give reliable figure or not? >> >>test suite cannot give estimate that is not wrong by more than 10 elo because >>only things like different time management and learning from previous searches >>in the game can change the rating by more than 10 elo. > >I have my own personal view based on more than 25 years experience on nearly all >chess programs which became available and very many experimenthal version, but >in this case I am trying to simulate a customer...a normal customer that wants >to know if the new program version is better than the previous one. > >so he can: > >1. Make test matches between 2 or more program to get an idea how much one >version is stronger than another one. > >2. Play against the new program and find out personally, but in this case he >must be not a weak player as he would loose anyway. > >3. Run this test set and see the result. > >Now, since someone claim that you can estimate the program strenght by running >this test set, how is it possible if the +- figure is too wide? > >SEVERAL PEOPLE HERE TALK AND TALK AND TALK, but do not make any proposal to >check this. > >Come on people and show how you can prove your statements! > >>>Now since you think different than me, what would be your proposal to find out >>>it they are correct or not? >>> >>>If you enlarge the Elo margin the whole test would not be meanful as how can one >>>knows if the new program version is better? >>> >>>Look Fritz, just to give an example, and how much it has increased in the SSDF >>>list one version to the next one. Can you verify if the new program is better >>>with a higher Elo margin? >>> >>>I do not think so. >>> >>>My is a proposal to find out, but if people prefer to talk only and be able to >>>say everything and the opposite, than there is no meaning to go on discussing >>>this matter. >>> >>>You see I like to solve problems and give solutions; I do not like to give only >>>words... >>> >>>> >>>>so now you have two numbers with error margins of +-30, which means that by >>>>error propagation their difference has a standard error of about 40 rating >>>>points (i.e. if you ran your own version of the SSDF list you would find rating >>>>differences up to 40 points between the two lists routinely). >>>> >>>>this shows that sandro's claim that the test suite should coincide with the SSDF >>>>by +-10 is ridiculous. >>> >>>If it is so, than make a better proposal...it is too easy to make critics... >>> >>>>i know i won't convince him, but i hope i can convince >>>>you ;-) >>> >>>You can convince me if you make good proposal... >>> >>>What we are trying to find out is: >>> >>>1. Can a test set allows a user to estimate a program strenght? >>>2. If yes, how can we find out this is true? >>>3. It must be without a too high margin as than it would be no meaningful. I >>>mean good enough to see the improvements between to program versions. >> >> >>A test may be good enough to see if A+1 is better than A and not good enough to >>see if A is better than B. > >How do you know it if the + - figures are too wide? >You mean better to solve the test set or better = stronger? No I use test suites but I look for the reason that it is better after solving them. > >> >>The important question for me as a programmer is if A+1 is better than A and not >>the exact difference in rating points or how much better. > >OK, I agree on this, but if the figure is too wide are you sure of the result? I never can be sure without games but I can have a reason to believe about something even without games based on results and common sense(results are the total output of the program and not only the number of solution in a given time. Uri
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.