Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: nullmove and tactics

Author: Pat King

Date: 15:21:56 03/27/04

Go up one level in this thread


On March 26, 2004 at 20:14:15, Sune Fischer wrote:

>On March 26, 2004 at 04:54:57, Uri Blass wrote:
>
>>On March 24, 2004 at 17:31:35, Dann Corbit wrote:
>>
>>>On March 24, 2004 at 16:53:08, Uri Blass wrote:
>>>[snip]
>>>>The difference is more important and 10-0 is clearly more telling than 19-11
>>>
>>>It is stronger, but less reliable.
>>
>>No 10-0 is clearly more reliable than 19-11
>
>The interesting question is if 10-0 is more "reliable" than 20-10, and it isn't.

From a statstical viewpoint, it is. 10-0 far exceeds 99% confidence, whereas
20-10 doesn't quite reach 95% confidence (see my table of "significant" wins
elsewhere in this thread).

>
>One can prove that draws are not important if one is only interested in knowing
>which one is better.

Now this is an interesting point. My statistical anlyses have assumed decisive
games, because in my testing I've come across very few draws in C to C games. Is
your assertion "draws are not important" because (for instance) your 20-10
result is really a 15-5-10 result (which I think would reach my binomial
threshold), or is there some sort of "trinomial" distribution out there that I
should be aware of?
>
>Note this is not to be confused with the question of how much difference in
>strength there is.
>It's two very different questions.

A point I granted Dann elsewhere in the thread.
>
>>It usually will not happen but it does not mean that it is less reliable when
>>it
>>happens(you may suspect that something in the conditions is wrong when you see
>>10-0 but if you see that no program was significantly slower in nps during the
>>match than you can safely stop the match after 10-0 and say that the new program
>>is better).
>
>If I suspect something might be wrong I will stop the match and investigate, but
>one can easily imagine 10-0 or similar under proper conditions.

Like newbie engines with no book (repeating the same test 10 times).
Like, perhaps, a very small or very bad book.
Like, what else?
>
>In fact yesterday I played a match where the score after 15 games was 13.5-1.5
>in favor of the new version.
>It actually ended up losing the match by 49-51 :(

I would argue, based on your final result, that you cannot conclude anything
about the difference between your two programs. I certainly wouldn't throw out
your new version on that result. That you got to a 12 game difference and then
ended up with an inconclusive result is highly unusual, but of course not
impossible. I most likely would have stopped testing at or before the 13.5-1.5
game point, and at the 100 game point, I don't think you can prove I would have
made a mistake.
>
>Honestly I do not remember having seen such a drastic score difference before,
>but I do regularly see a sequence of 6-8 straight wins by one of the engines in
>100 game match, so it's not impossible to imagine this might occur at the
>beginning of the match.
>
>-S.
When you're playing 100 game matches, and therefore have seen 1000s of games,
I'm not surprised you've seen 6-8 game streaks. But I would argue that you
haven't learned anything by playing such long matchs. Yes, those streaks might
make me accept a bad new version. But after 100 games, you can only still say
that I "might" have made a mistake.

One weakness to my testing method (go until you get a "significant" result (I
use 95% confidence) or until I get bored), is that it smacks of self-selection.
If one waits long enough, chance will ensure the answer one wants. So picking a
30 or 100 game limit seems a reasonable safeguard against this.

What to do with your inconclusive result in such cases is another matter. If
your 49-51 result were testing your implementation of null move, I'd be worried.
If you were only messing around with some eval weights, I'd be reassured that I
hadn't broken anything TOO badly.

Bottom line, this stuff is hard :)

Pat



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.