Author: Thorsten Czub
Date: 06:22:41 09/10/02
Go up one level in this thread
On September 10, 2002 at 08:35:32, Uri Blass wrote: >Statistics also does not tell me that playing 288 games has no meaning. >If the new version can beat the old version it is not enough and tests may be >needed also against other programs but if it is losing then it means that there >is a problem and you need to look in the games to find it. when a "new version" wins or loses against the older version, it does not mean it is weaker or stronger. it can lose and still be stronger than older version. it can win and still be weaker than older version. all it measures is: that the new version is DIFFERENT. if it is NOT different, you will get results near 50%. if you get results > or even < 50% the version is DIFFERENT. but this does not tell you ANYthing about the strength of this version overall. therefore it is senseless if you play 1 game x vs. x+1 or 288, 400 or 4000. all you measure is: it is different on the base of 1 game. it is different on the base of 288 games. it is still different on the base of 400 games. it does not tell you anything about WHAT different means. thats the problem with statistics. people who have no idea about statistics believe when Rebel Century4 loses 55 % against Rebel XP, and this was played out in 288 or 400 or 4000 games, this would mean that century4 is weaker than Rebel XP. but this is not true. you have no idea how xp will play against OTHER programs. because you measured how it plays against century4. there you can say something about, with a chance of x % to be right in your observation. but - what does this tell you about overall strength of version X+1 ? nothing. if you test your car-tires when the asphalt is dry, it does not tell you anything about how the tires will react when the streets/asphalt is wet. >It is wrong because Martin also looked in the games. if he looks in the games, i don't understand his question overall. of course results can change over a long period of games. but i do not look for results. people try to confuse here . because some people in this newsgroup publish only stupid results. but results cannot helpo much when you don't discuss WHY those results have happened. and some results are misleading because of the setting of the event. >Uri
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.