Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Rybka 1.01 Beta 13 b First impressions

Author: Vasik Rajlich

Date: 14:49:03 02/01/06

Go up one level in this thread


On January 30, 2006 at 18:14:21, Albert Silver wrote:

>>Let me put it like this: that is the chance at this point (ie. just based on the
>>games you list above, without any further testing) that your hypothesis is
>>correct?
>>
>>If you analyze it "statistically", you might get some figure like (let's say)
>>30%. Just a total wild guess, eyeballing your numbers.
>>
>>In reality, though, it's probably more like 3%.
>>
>>The reason is that before your experiment started, there were let's say 20
>>candidate hypotheses, that you didn't even bother to list. Maybe ultrasolid is
>>worse against Fritz, maybe it's better in closed positions, etc. One of these
>>hypetheses is likely to get lucky, and this hypothesis will then of course have
>>very nice data to support it.
>>
>>Anyway, there is nothing wrong with this procedure, as long as you eventually
>>test your hypothesis "straight up".
>>
>>Vas
>
>By "straight up", do you mean it is tested alone without any other parameters
>influencing? If so, wouldn't that go against the theory that each parameter is
>independent of the other and should bring its fruits?
>

What I mean by straight up is that the testing should go in the following
sequence:

1) Play a bunch of games, with various settings, without any special
expectations
2) Identify some trend - let's call it a "candidate hypothesis"
3) Test the candidate hypothesis with many many more games

Maybe, to be really fair, the games from step 1 should even be thrown away, and
only the games from step 3 should be used. Not sure about this.

Consider the following scenario.

There is some person (Joe) who just won the lottery. You want to see if Joe is a
specialist at winning lotteries. So, Joe buys many more lottery tickets, and we
see how he performs. The question is: is it fair to include that first win in
his statistics, or not? Probably not.

Vas

>Still, I *have* noticed that UltraSolid has scored better with Fritz 9 and worse
>with Fruit 2.2. I tested it alone, and I tested it with other settings, and this
>observation is a result of this consistent difference. Hurd's results, using a
>different set of positions (I used the Silver Suite, and he the Nunn2) tends to
>corroborate this. It is the key reason I insist on not just trying a zillion
>games against one opponent, but split it up among several, and hopefully remove
>this problem and get a better idea as to its worth. Even if that means it is
>unconclusive, since that would mean it was inconclusive after 200 games and 4
>opponents, as opposed to zero data, or 200 games against the same opponent.
>
>Note that my suite hardly covers it all, but it does provide a decent variety of
>openings, and types of positions.
>
>                                           Albert



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.