Author: Vasik Rajlich
Date: 14:49:03 02/01/06
Go up one level in this thread
On January 30, 2006 at 18:14:21, Albert Silver wrote: >>Let me put it like this: that is the chance at this point (ie. just based on the >>games you list above, without any further testing) that your hypothesis is >>correct? >> >>If you analyze it "statistically", you might get some figure like (let's say) >>30%. Just a total wild guess, eyeballing your numbers. >> >>In reality, though, it's probably more like 3%. >> >>The reason is that before your experiment started, there were let's say 20 >>candidate hypotheses, that you didn't even bother to list. Maybe ultrasolid is >>worse against Fritz, maybe it's better in closed positions, etc. One of these >>hypetheses is likely to get lucky, and this hypothesis will then of course have >>very nice data to support it. >> >>Anyway, there is nothing wrong with this procedure, as long as you eventually >>test your hypothesis "straight up". >> >>Vas > >By "straight up", do you mean it is tested alone without any other parameters >influencing? If so, wouldn't that go against the theory that each parameter is >independent of the other and should bring its fruits? > What I mean by straight up is that the testing should go in the following sequence: 1) Play a bunch of games, with various settings, without any special expectations 2) Identify some trend - let's call it a "candidate hypothesis" 3) Test the candidate hypothesis with many many more games Maybe, to be really fair, the games from step 1 should even be thrown away, and only the games from step 3 should be used. Not sure about this. Consider the following scenario. There is some person (Joe) who just won the lottery. You want to see if Joe is a specialist at winning lotteries. So, Joe buys many more lottery tickets, and we see how he performs. The question is: is it fair to include that first win in his statistics, or not? Probably not. Vas >Still, I *have* noticed that UltraSolid has scored better with Fritz 9 and worse >with Fruit 2.2. I tested it alone, and I tested it with other settings, and this >observation is a result of this consistent difference. Hurd's results, using a >different set of positions (I used the Silver Suite, and he the Nunn2) tends to >corroborate this. It is the key reason I insist on not just trying a zillion >games against one opponent, but split it up among several, and hopefully remove >this problem and get a better idea as to its worth. Even if that means it is >unconclusive, since that would mean it was inconclusive after 200 games and 4 >opponents, as opposed to zero data, or 200 games against the same opponent. > >Note that my suite hardly covers it all, but it does provide a decent variety of >openings, and types of positions. > > Albert
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.