Author: Ralf Elvsén
Date: 07:17:09 01/07/01
Go up one level in this thread
On January 07, 2001 at 07:46:53, Uri Blass wrote: >On January 07, 2001 at 07:34:20, Ralf Elvsén wrote: > >>On January 06, 2001 at 23:51:59, Uri Blass wrote: >> >>>On January 06, 2001 at 21:02:08, Ralf Elvsén wrote: >>> >>>>On January 06, 2001 at 08:51:12, Uri Blass wrote: >>>> >>>>>On January 06, 2001 at 08:46:57, Jorge Pichard wrote: >>>>> >>>>>>You should let that match continue up to 50 games, instead of 40 games. >>>>>> >>>>>>Pichard. >>>>> >>>>>The number of games should be decided before the match and not during the match. >>>>> >>>>>Uri >>>> >>>>Why is that? >>>> >>>>Ralf >>> >>>There is an obvious reason. >>> >>>people may try to get the result that they want by playing different number of >>>games and it is not fair. >>> >>>Uri >> >>If the result at the end of a test isn't liked by a tester >>and he plays more games, won't the result just become more accurate? >>(Assuming he always reports all games played.) >> >>Ralf > >No > >Assume that 2 programs is equal and the tester prefers program A. > >After 40 games the result is 22-18 for B and the tester decides to play more >games and to stop only when A is leading. > >If the result is 22-18 for A the tester decides to stop the match. > >A is going to win in both cases not because it is better but because the tester >manipulated the results by deciding when to stop during the match. > >Uri First of all, the latter result will still be more accurate. Secondly, if A keep on losing he will have to play a huge number of games, which is not realistic. Compare the classic attempt to win in roulette playing red/black: if you loose, double your previous stake. It takes an infinite amount of money to be sure, and why would you play if you have that ? :) But if A and B are equally good, when finally A gets in front, the relative score will be very close to 50% and with small uncertainty. And if his favourite program actually scores less than the result it got this far he will be worse off in the end, cause he doesn't know it's really equally good as B. It's the conclusions you draw that matters. If I just wan't to know the relative strength of A and B, I see no problem with more games. But if your main concern is the SSDF-rating I'm not sure I can argue against you. I guess it's possible, although I haven't scrutinized the problem, that an evil SSDF-tester who has more information than the results of the previous games can affect the rating of a program by this procedure. But if he does, I guess he could come up with a less timeconsuming strategy to favour a program... Ralf
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.