Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Fritz 7 - Junior 7, Another comeback

Author: Ed Schröder

Date: 02:11:28 11/17/01

Go up one level in this thread


On November 17, 2001 at 04:13:45, Peter McKenzie wrote:

>On November 17, 2001 at 03:23:23, Ed Schröder wrote:
>
>>On November 17, 2001 at 01:34:28, Uri Blass wrote:
>>
>>>On November 17, 2001 at 00:25:14, Ed Schröder wrote:
>>>
>>>>On November 16, 2001 at 19:12:59, Christophe Theron wrote:
>>>>
>>>>>On November 16, 2001 at 16:20:45, Hansjoerg wrote:
>>>>>
>>>>>>Some time ago I posted the (surprising) result from an engine tournament between
>>>>>>Fritz 7 and Junior 7 (14.5-4.5).
>>>>>>Now I finished 100 games and the result is F7 (49.5) - J7 (50.5)!
>>>>>
>>>>>
>>>>>
>>>>>Welcome to the world of statistics!
>>>>>
>>>>>
>>>>>
>>>>>    Christophe
>>>>
>>>>
>>>>Yep, ask any casino director where his profit comes from :)
>>>>
>>>>I once have written a 10-line C program, it randomly generates a score
>>>>(1-0, 1/2-1/2, 0-1) and then starts counting. Results vary tremendously.
>>><snipped>
>>>>100 games: 70-30
>>>
>>>I understand your idea but
>>>70-30 between equal programs does not make sense.
>>
>>
>>But Uri, you can program yourself, just try it. It is pure statistics.
>
>but that is just the point, Uri was saying that 'pure statistics' would make a
>70-30 result between equal programs extremely unlikely.

If that is what Uri meant then it is okay because it happens rarely. Point
is you can not fully rely on 100 games when you want to prove / disprove
a program change by such a way of testing.



>Getting late here,

See below...



>so I
>don't have the energy to check it but I suspect he is correct.
>
>perhaps your psuedo rng wasn't so good?
>
>Also, (I know this topic has been done to death before but I wasn't listening)
>if you really want to simulate it accurately you need to simulate the correct
>draw percentage I think.  I think a high draw percentage increases the variance,
>agreed?

Agreed.



>>And if you are not convinced, run 1000 blitz games, divide them in parts
>>of 100, and you will see at least one 70-30 score.
>
>Are you sure?
>If so, how sure? :-)

I must confess, I exaggerated, it is early here :)

I at least have done so several times through the years to satisfy my
curiosity and I don't remember a 70-30 case, at least not for sure. But
the message and impression the data gave me was clear and has remained
in my memory, that is not to trust 100 games for a final verdict to
measure an elo difference between version x and y.

More interesting: also in your case you must have realized by now the days
you could improve your engine by 50 points in one day are over, your program
ís already too strong for that. Those were the days :)

So what is left?

Changes, improvements of say 1-2 and if you are lucky 5 elo points?

Now, how on earth are you going to measure 2 elo points?

I better stop now otherwise I need Prosac, lots of them....

Ed



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.