Author: Alberto Rezza
Date: 04:59:44 07/16/00
Go up one level in this thread
On July 16, 2000 at 03:34:45, Ed Schröder wrote: >>posted by Dann Corbit on July 15, 2000 at 20:21:54: > >>Simplifying. I have a penny. >>I toss it twice. >>Heads, heads. >>I toss it twice >>Heads, heads. >>I toss it twice >>Tails, heads. >>I toss it twice >>Heads, tails. > >>I count them up. > >>Heads are stronger than tails. > >>My conclusion is faulty. Why? Because I did not gather enough data. > >Right. Wrong. Perhaps it was the wrong example? Such a weakly defined "conclusion" is obviously correct. It's not even necessary to dig out your old statistics book. Try testing for P(heads) >= 0.5 + X with confidence 0.5 + Y. Without any calculation we can say that for X and/or Y small enough "Heads are stronger than tails" is justified. >But what the crazy result of match-2? Apparently after 300 games it is >still not enough to proof that the 10% faster version is superior (of >course it is) but the match score indicates both versions are equal >which is not true. > >So how many games are needed to proof that version X is better than Y? Yes. So the problem is: how much confidence do we need in the chess programs' strength? Are 300 games not enough? It seems to me that most people here have set very strict standards for computers. If we were to apply such standards to human players, we would have to conclude that when a player gets a GM title from FIDE we really cannot say whether he is of GM strength; or we might say that we don't have enough games by Morphy to tell whether he was stronger than the average club player of his time... A program whose results are good for 3 GM norms has GM strength - and that should be all. Alberto
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.