Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Rebel XP Machëide x2cou_51 and 56 strike back... games versus Fritz7

Author: Uri Blass

Date: 06:44:23 09/10/02

Go up one level in this thread


On September 10, 2002 at 09:22:41, Thorsten Czub wrote:

>On September 10, 2002 at 08:35:32, Uri Blass wrote:
>
>>Statistics also does not tell me that playing 288 games has no meaning.
>>If the new version can beat the old version it is not enough and tests may be
>>needed also against other programs but if it is losing then it means that there
>>is a problem and you need to look in the games to find it.
>
>when a "new version" wins or loses against the older version,
>it does not mean it is weaker or stronger.
>
>it can lose and still be stronger than older version.

There may be statistical error but if the number of games is big then it means
that it is not clearly better than the old version and it means that there is a
problem.

>it can win and still be weaker than older version.

I agree and beating the older version is not enough to be sure that it is not
weaker than the old version.

>
>all it measures is: that the new version is DIFFERENT. if it is NOT
>different, you will get results near 50%.
>
>if you get results > or even < 50% the version is DIFFERENT.
>
>but this does not tell you ANYthing about the strength of this version
>overall.

Here I do not agree.

If it is losing against the old version in a long match it means that there is a
problem.

>
>therefore it is senseless if you play 1 game x vs. x+1 or 288, 400 or 4000.
>all you measure is:
>
>it is different on the base of 1 game.
>it is different on the base of 288 games.
>it is still different on the base of 400 games.
>
>it does not tell you anything about WHAT different means.

no
Games between x and x+1 give you more than final result and you can learn from
them.

>
>thats the problem with statistics. people who have no idea about statistics
>believe when Rebel Century4 loses 55 % against Rebel XP, and this was
>played out in 288 or 400 or 4000 games, this would mean that century4 is weaker
>than Rebel XP.
>
>but this is not true.

I agree about it.

>
>you have no idea how xp will play against OTHER programs.
>because you measured how it plays against century4. there you can
>say something about, with a chance of x % to be right in your observation.
>
>but - what does this tell you about overall strength of version X+1 ?
>
>nothing.
>
>if you test your car-tires when the asphalt is dry, it does not tell you
>anything about how the tires will react when the streets/asphalt is
>wet.
>
>>It is wrong because Martin also looked in the games.
>
>if he looks in the games, i don't understand his question overall.

He did not read your posts in the past and probably assumed that you may do the
same error that people who look only for results do.

>
>of course results can change over a long period of games.
>but i do not look for results.
>
>people try to confuse here .
>
>because some people in this newsgroup publish only stupid results.
>but results cannot helpo much when you don't discuss WHY those results
>have happened.
>
>and some results are misleading because of the setting of the event.

I agree that it is important to know the reason of the results.

Uri



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.