Subject: Re: Proving something is better

Author: Rémi Coulom

Date: 01:48:45 12/20/02

On December 19, 2002 at 02:35:47, Bruce Moreland wrote:

>The most compelling evidence is the autoplay match where VR=3 scored 68.5%.
>These games are not available online.  I was going to check to see if the
>programs got into a rut and played the same game over and over again, but I
>can't do that.
>Assuming that they played 100 unique games, the question remains as to whether
>68.5% proves anything.  You can say, of course it does, but the real answer has
>to do with statistics.  There is no way that a "real" scientific journal would
>accept "of course it does" as an answer -- they'd want the math.  You don't
>provide the math.

68.5% does prove a lot. It would be necessary to know the number of draws to
estimate this statistical significance, but even in the worst case (1 draw, 68
wins and 31 losses), the likelihood that verified null-move is better is
0.999908, according the small "WhoIsBest" tool I made (and supposing that the
(reasonable) hypothesis this tool is based on are true).

So, yes, this is compelling evidence. In my personal opinion, it is so
compelling that it is suspect. I will try Omid's technique in my program and
report my results, but I cannot believe it could obtain such a crushing result
against R=2. Of course, there is still the argument that it may work very well
in Omid's program and not in mine because of other differences. But I believe
that if Omid's program is so different that it is the only one where his
technique works, then there is a problem.

Omid, if you read this, I would like to suggest that you make your source code
public. This would help to clear a lot of things. There seems to be something
wrong with your paper, and I would like very much to help to sort this out. If
you do not wish to make your source public, you could also send it to me so that
I can investigate what is so different between your program and others. I would
not reveal anything publicly without your agreement, either about your code or
about the conclusion of my investigations.


