Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: crafty speedup numbers

Author: martin fierz

Date: 01:42:55 05/07/04

Go up one level in this thread


On May 07, 2004 at 03:17:44, Gian-Carlo Pascutto wrote:

>On May 06, 2004 at 19:03:48, martin fierz wrote:
>
>>point #3 is perhaps most important for the bob vs vincent duel: the standard
>>error for a 4 CPU test run is on the order of 0.2. if vincent's tests were with
>>a similarly small number of positions, then the differences measured in these
>>experiments (2.8 / 3.0 / 3.1) are statistically insignificant, and the whole
>>argument is pointless :-)
>
>Not recessarily - disabling nullmove produced a result with the standard errors
>halved in my results. That would still allow a significant conclusion.
>
>If I assume your 0.2 is a 2SD number (95%), your results are compatible and
>running a non-nullmove test could still produce the same result. If 0.2 is a
>1SD number then for some reason your results were much more variable then mine.
>
>                n    speedup  error (1SD)
>------------------------------------------
>Nullmove       38     2.82     +- 0.101
>No-nullmove    39     3.07     +- 0.056
>
>--
>GCP

a repeat of "my" numbers

4 CPUs:
3.15 +- 0.15
3.29 +- 0.20
3.06 +- 0.12
3.19 +- 0.13

i hasten to add that my +- is 1SD of course (nobody ever gives +- 2SD, unless
explicitly stating so). nevertheless the 4 results are mutually compatible, you
don't need 2SDs for that.
the largest of my 1SDs was 0.2 that's why i quoted it as "on the order of 0.2".

the some reason that my SDs are larger is at least in part that you used a
larger testset.
your numbers give a 2.17 sigma difference between null and non-null search, if i
don't miscalculate. which i would classify as borderline significant. and
immediately add the question why one set is 38 positions while the other is 39,
and also add the comment that with a much larger testset you could easily
resolve your borderline significance.

about the large variability i posted: i have added two sets of numbers i got
from bob's logfiles at the end of this post (sorry for the many digits, didn't
feel like hacking them off by hand...). 1 run with 2CPUs, 1 with 4. as you can
see, results >2.0 in th 2CPU test occur, as do such > 4.0 in the 4CPU, unlike in
the DTS paper :-)

the large variability in such tests means it is not enough to compute the
standard error and leave it at that, for small sets like the ones you and i have
been using the choice of the set will also contribute to the error (e.g. if bob
wanted to write a paper making crafty look good, he would throw out certain
positions which always give a bad speedup). meaning that you  must use the same
set of positions to make comparisons. you should throw out that position #39 for
your 0 / non-0 comparision.

one final remark: with the little data available, i cannot check whether the
speedup numbers are normally distributed at all. the computation of sigmas and
statistical significance however assumes a normal distribution. if this is not
the case (and it isn't necessarily!), then these computations about statistical
significance have no proper meaning any more. that's the reason i'd classify
2.17 sigma as borderline - you can't be sure that your distribution allows you
to make such conclusions.

cheers
  martin

2CPU run A

1.549450549
1.983435048
1.889055472
2.014925373
2.474576271
2.518072289
1.888888889
1.626506024
1.677966102
1.882352941
2.064516129
1.630136986
2.760416667
1.632
1.292307692
1.251461988
2.621910488
2.028571429
2.809991079
2.175
2.137681159
1.099009901
1.882882883
2.176470588

4CPU run A

2.906360424
2.789699571
2.978723404
3.243632869
3.318181818
2.322222222
3.588562656
3.423788993
3.09375
2.018018018
3.506849315
3.548002385
2.912087912
2.518518519
2.473498233
1.739837398
3.688909774
2.696202532
3.613421279
4.693822498
4.836065574
3.113604488
2.943661972
3.628340279



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.