Author: Bo Persson
Date: 04:09:12 01/04/04
Go up one level in this thread
On January 04, 2004 at 05:57:37, George Tsavdaris wrote: >On January 04, 2004 at 00:42:02, Christophe Theron wrote: > >>On January 03, 2004 at 20:53:39, Rick Rice wrote: >> >>>Person A posts a message saying Ruffian 2.0 is very dissapointing, with the >>>results to back it up. This is followed by a second post which basically says >>>that Ruffian 2.0 rocks with some results to back it up. Are these programs >>>really so time and hardware sensitive, so as to show varying results on >>>different CPUs/time controls? >>> >>>Ideal solution would be for SSDF to have one massive board with one CPU and >>>memory for each program (equal CPU and mem for all the progs on its list) and >>>some way to automate the play of these programs against each other..... on >>>different time controls such as regular, blitz etc. Just wishful thinking for >>>the future, but it would eliminate the multiple and varying results. >>> >>>Cheers, >>>Rick >> >> >> >>Statistics are extremely important in chess, and in computer chess. >> >>Unfortunately, even after years of talks about the subject, almost nobody on >>this message forum understands that you really need A LOT OF GAMES to start to >>have an impression of a probability about which program is stronger. >> >>The variations you have noticed do not come from different setups. >> >>These variations are statistical variations. That means that most of the match >>results posted here are statistically MEANINGLESS. > >It would be better, if you first define when something is statistically >meaningless. > >> >>People love to proudly post the result of the 20 games match they have run >>overnight. They don't even care to know if that result has any meaning. Well in >>most of the cases the result means nothing (just a waste of electric power) and >>you should not care about it at all. > > Always the result mean something. If someone play a match with parameters AA >between engine X and Y, Z number of games, then we are able to conclude some >things. > For example that X is stronger than Y with a probability k % (0<k<100) >when these two play with AA parameters. > > You say "most of the cases the result means nothing", so with that, you believe >that there are some cases(parameters AA,games Z) that the result means >something. I think Christophe means that if k% is not big enough, we don't really know meaning of the result. > And that for all other parameters AA, games Z the results are meaninless. >Why? Who can define the right parameters AA, number of games Z? Perhaps the god? No, but a statistician can tell you how many samples are needed to reach a conclusion with a specific certainty. The samples required are MUCH more than a quick test will give you, especially if you test engines that are really close. When you get a result of say 16-14 with an error interval of 10, you really can't say anything for sure. One engine is better than the other, unless they are equal. :-) Bo Persson
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.