Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Small number statistics and small differences

Author: Dan Homan

Date: 08:36:56 08/14/98

Go up one level in this thread


On August 14, 1998 at 10:42:04, Bruce Moreland wrote:

>
>On August 14, 1998 at 06:08:36, Dan Homan wrote:
>
>>On August 12, 1998 at 09:15:19, Bruce Moreland wrote:
>>
>>>
>>>On August 11, 1998 at 06:52:02, Tony Hedlund wrote:
>>>
>>>>
>>>>>So I think 4-0 actually turns out to be a significant result.  If you score 4-0,
>>>>>you can say that there is a very good chance that the one with the wins is
>>>>>better than the ones with the losses.
>>>>>
>>>>>You can't say this if you pick out a string of 4 wins in a row in the midst of a
>>>>>longer match, since you might be selecting a fluke case, but if you just start
>>>>>from scratch, and get 4-0, you should be able to stop.  In fact I think you
>>>>>might be able to stop if you get 3.5 - 0.5, but I am less certain of this case.
>>>>>Someone who has more statistics than I may be willing to comment on this.
>>>>
>>>>Recently I played the match Shredder2 P200 MMX 64MB - Rebel8 P90 16MB.
>>>>Rebel won the first four games but Shredder won the match with 11-9.
>>>
>>>That shouldn't happen very often.
>>>
>>>bruce
>
>If this does happen more often than it should, perhaps some effort could be
>expended to figure out why.  I don't know why it would happen more often than it
>should.
>
>>The problem with small number statistics is that they can be very
>>mis-leading.  A 4-0 result in a 4 game match between nearly equal
>>programs (with 20% draw chances) happens about 1/40 th of the time.
>>A 3.5-0.5 (or better) result happens about 1/13 th of a time.
>
>If you want to be able to say, "program A is stronger than program B, by at
>least a little bit", would you rather have a 4-0 result or a 105-95 result?
>
>You should get a bogus result from 4-0 only 2.5% of the time by your
>calculation,

Actually I think my calculation was wrong.  I didn't consider the chance
of the weaker program going 4-0.... see below.

>how much do you want to bet that you'd get a bogus result from
>105-95 a lot more of the time?

Of course, but my point is that if you think you are learning anything
useful from 4-0 result in a 4 game match between top programs you are
wrong.  I initially thought this was a great idea until I thought about
it - the more I think about it the worse it gets.

Take the example I gave in my last post (I snipped it here).  Using
4 game matches to decide between versions of a single program with small
differences will lead to an essentially random evolution of the program!

The problem is to measure small differences in strength.

Assume a marginally stronger program has a 42% chance of winning a game
while its marginally weaker opponent has a 38% chance of winning a
game.  (implicit is a 20% draw chance)

The probability of the stronger program going 4-0 is 1/32, but the
probability of the weaker program going 4-0 is 1/48.  So if you get a
4-0 result from the match, you are only 60% sure that the stronger
program won!

Lets bump it up to 45%-35% chances of winning for the stronger/weaker
program.  If the match gives a 4-0 result you are still only 74% sure
that the stronger program won!

 - Dan



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.