Author: Rolf Tueschen
Date: 12:58:56 06/23/04
Go up one level in this thread
On June 23, 2004 at 15:47:24, Sandro Necchi wrote: >On June 23, 2004 at 08:30:50, Uri Blass wrote: > >>On June 22, 2004 at 13:24:33, Sandro Necchi wrote: >> >>>On June 21, 2004 at 18:28:22, martin fierz wrote: >>> >>>>On June 21, 2004 at 13:50:11, Gian-Carlo Pascutto wrote: >>>> >>>>>On June 21, 2004 at 10:30:33, martin fierz wrote: >>>>> >>>>>>On June 20, 2004 at 02:56:08, Sandro Necchi wrote: >>>>>> >>>>>>>There is a simple way to verify if the "authors" are correct or not. >>>>>>> >>>>>>>They should state clearly how to evaluate all the solutions of the tests >>>>>>>comparing the hardware to the SSDF one, in order to create the Elo figure. >>>>>>> >>>>>>>Then by choosing the next release of 5 commercial programs which will be tested >>>>>>>by SSDF they have to predict the Elo for ALL 5 chess programs with a + - of 10 >>>>>>>points. >>>>>>> >>>>>>>Than and indipendent tester should run the tests. >>>>>>> >>>>>>>If they fail, than they loose. >>>>>>> >>>>>>>Sandro >>>>>> >>>>>>+-10 elo, you must be kidding! >>>>>>the SSDF results themselves have larger error margins than that... >>>>> >>>>>Yes, but the ratinglists don't list errors and rank programs with smaller >>>>>differences than 10 ELO. >>> >>>Hi Martin, >>> >>>> >>>>that has nothing to do with this discussion. if the SSDF rating list, with a >>>>very computing-time-intensive testing methodology, produces ratings with >>>>typically +-30 error bars, you cannot expect a simple test suite to be any >>>>better. so you have to allow it a +-30 margin of error too, except if you want >>>>to claim that the test suite is better than the SSDF list, which i believe not >>>>even the most hardcore promoters of test suites would do. >>> >>>This is not fully correct because the more games you play in the SSDF list and >>>the error margin decrease, however if you take a look after the first Elo is >>>achived 95% of the programs, if not more, do not change the Elo by a high margin >>>+- 10 points so if it is true what the authors state that these test set are >>>able to estimate the program strenght is correct than they should be able to >>>give reliable figure or not? >> >>test suite cannot give estimate that is not wrong by more than 10 elo because >>only things like different time management and learning from previous searches >>in the game can change the rating by more than 10 elo. > >I have my own personal view based on more than 25 years experience on nearly all >chess programs which became available and very many experimenthal version, but >in this case I am trying to simulate a customer...a normal customer that wants >to know if the new program version is better than the previous one. > >so he can: > >1. Make test matches between 2 or more program to get an idea how much one >version is stronger than another one. Sandro, you are in a double bind. You ask as a customer and the customer asks which version is better, the new one he just bought???? I mean, are you serious about this? Why should someone ask such questions? Why should someone do something to answer this question? I will tell you this. If I buy the new version of my favorite program I _e-x-p-e-c-t_ it to be stronger than the last version - - - for sure! If NOT I wouldn't have bought it. So, I know that it is stronger, how much doesn't interest me. I want to play against it, I want to train, I want to analyse my own games with it. You are not a customer. You are an expert with a biased expert perception. Real users don't have such questions, which version is stronger. It's clear that the newest are stronger. Period. :) > >2. Play against the new program and find out personally, but in this case he >must be not a weak player as he would loose anyway. > >3. Run this test set and see the result. > >Now, since someone claim that you can estimate the program strenght by running >this test set, how is it possible if the +- figure is too wide? > >SEVERAL PEOPLE HERE TALK AND TALK AND TALK, but do not make any proposal to >check this. > >Come on people and show how you can prove your statements! > >>>Now since you think different than me, what would be your proposal to find out >>>it they are correct or not? >>> >>>If you enlarge the Elo margin the whole test would not be meanful as how can one >>>knows if the new program version is better? >>> >>>Look Fritz, just to give an example, and how much it has increased in the SSDF >>>list one version to the next one. Can you verify if the new program is better >>>with a higher Elo margin? >>> >>>I do not think so. >>> >>>My is a proposal to find out, but if people prefer to talk only and be able to >>>say everything and the opposite, than there is no meaning to go on discussing >>>this matter. >>> >>>You see I like to solve problems and give solutions; I do not like to give only >>>words... >>> >>>> >>>>so now you have two numbers with error margins of +-30, which means that by >>>>error propagation their difference has a standard error of about 40 rating >>>>points (i.e. if you ran your own version of the SSDF list you would find rating >>>>differences up to 40 points between the two lists routinely). >>>> >>>>this shows that sandro's claim that the test suite should coincide with the SSDF >>>>by +-10 is ridiculous. >>> >>>If it is so, than make a better proposal...it is too easy to make critics... >>> >>>>i know i won't convince him, but i hope i can convince >>>>you ;-) >>> >>>You can convince me if you make good proposal... >>> >>>What we are trying to find out is: >>> >>>1. Can a test set allows a user to estimate a program strenght? >>>2. If yes, how can we find out this is true? >>>3. It must be without a too high margin as than it would be no meaningful. I >>>mean good enough to see the improvements between to program versions. >> >> >>A test may be good enough to see if A+1 is better than A and not good enough to >>see if A is better than B. > >How do you know it if the + - figures are too wide? >You mean better to solve the test set or better = stronger? > >> >>The important question for me as a programmer is if A+1 is better than A and not >>the exact difference in rating points or how much better. > >OK, I agree on this, but if the figure is too wide are you sure of the result? > >> >>Uri > >Sandro
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.