Computer Chess Club Archives

Search

Terms

Messages

Subject: Re: a number and two questions for bob

Author: martin fierz

Date: 14:29:00 05/05/04

On May 05, 2004 at 11:18:52, Robert Hyatt wrote:

>On May 05, 2004 at 05:10:36, martin fierz wrote:
>
>>hi bob,
>>
>>rereading your DTS paper (you sent me a copy once), you reported 24 speedup
>>numbers for 4 processors (given in the end, for anybody interested).
>>
>>i get (using a black box):
>>
>>av. speedup: 3.65
>>standard deviation of sample: 0.31
>>standard error of average 0.064
>>
>>so: average speedup(N=4) = 3.65 +- 0.07 would be a nice way to put this.
>
>Where does the standard error come from?

as GCP already mentioned, std.err = 1/sqrt(N)*std.dev.

>Are you looking at the speedups for
>two positions and using the difference (summed over all positions) as the error?

so no, just the simple formula above.

> That's one kind of error.  The other is the non-repeatible error which is the
>real problem that needs addressing.  If I run the _same_ test again, rather than
>3.65 it might produce 3.3 or 3.9 which is the real problem...

this is not a real problem. when i measure something in physics, this happens
ALL THE TIME. that's why we physicists are rather comfortable with repeating
experiments many times, and using statistics to find "true" values - it's the
only way we can measure something. i understand that computer scientists mostly
work with deterministic stuff, and therefore feel a little uncomfortable with
this, but i can assure you, it's just fine :-)
all you need is enough measurements (personally, i find 24 or 30 positions
rather little). you can also repeat the same test a number of times, and see
whether the results really vary wildly...


>The 2.8 number came from the same test set used in the DTS paper as for some
>reason, Vincent thought that Crafty would produce _zero_ speedup on those.  GCP
>ran the test on a quad 550mhz machine of mine.  The 3.1 was produced by my
>running the _same_ test set on my quad 700mhz box.  I sent both the log file so
>they can confirm both my 3.1 and GCP's 2.8.  That just shows the variability.  I
>have seen one 3.4 on that test set BTW, whether it might do even better is just
>a guess.  And whether today's Crafty will do better or worse on that particular
>problem set is also unknown although I should probably run it to see, since so
>much has changed (evaluation, extensions, etc) in the past couple of years.
>
>I _believe_ there were 30 positions, but if you are looking at the DTS paper, we
>used exactly those positions so it will give you the right number of FEN
>strings...
>
>
>
>
>
>
>>2) can you give a similar error estimate for the 3.1 number (both std. dev and
>>std. error)? or even better, a full set of numbers so that i can do with them
>>whatever i want, since you seem so reluctant to compute std/ste? :-)
>
>What I can do is run a 1, 2 and 4 cpu run and either post the entire log, or
>just the "time line" grepped from each log to give the time and total nodes
>searched...
>
>If you want just the grepped info, the next step would be for me to give you one
>set of data for 1 cpu, and maybe 4 sets of data for 2 and 4, so that you can see
>the error between positions as well as the overall error or variance...

yes, that would be nice!


>>3) right, question 3 of 2 :-): you claimed somewhere deep down in the other
>>thread that it matters whether you look at related or unrelated positions. you
>>could prove/disprove this experimentally with a set of related positions (eg
>>from games of crafty on ICC) vs. a large test set (e.g. WAC).
>
>
>Yes, although I think the basic proof is trivial.  On related positions you
>simply search deeper due to the hash table effects.  Schaeffer and others have
>repeatedly found (myself included I failed to add) that deeper searches make the
>search more efficient.  But doing this test is harder.  IE it isn't reasonable
>to search to "fixed depth" for each position as that is not how it works in a
>real game and it can skew the times somewhat...  adding yet another bit of
>variability...

i see - admittedly this makes it a bit more problematic. still, you could run it
to a fixed depth, it's better than nothing. and even if you are right, and it is
trivial that it is like you say: aren't you at all interested to see how big the
difference is? :-)


>If you want me to run it and post the grepped numbers, you will see why I don't
>do it often.  There is a _lot_ of variability.  IE for four processors I feel
>perfectly comfortable claiming 3.0 +/- .3 for example.  That +/- .3 is a pretty
>big spread but within reason.  I am also certain that testing on problem sets
>produces different results than testing on a real game.  But using a real game
>makes it difficult for us to compare program A with B, since they wouldn't play
>the same game, and testing on different test sets can easily produce different
>numbers...

the results in the DTS paper had a very small variability compared to the 0.3
you just quoted. of course, i'm talking about the standard error of the average
here, not the variance.

cheers
  martin

Re: a number and two questions for bob Robert Hyatt 15:24:11 05/05/04
- Re: a number and two questions for bob martin fierz 02:39:54 05/06/04
  - Here is data Robert Hyatt 10:45:18 05/06/04
    - Re: Here is data (more data) Robert Hyatt 11:42:46 05/06/04
      - detail Robert Hyatt 11:45:17 05/06/04
        
        Re: even more data Robert Hyatt 12:01:18 05/06/04
  - Re: a number and two questions for bob Robert Hyatt 08:24:15 05/06/04
  - Re: a number and two questions for bob Gian-Carlo Pascutto 05:16:47 05/06/04
- Re: a number and two questions for bob Gian-Carlo Pascutto 23:50:19 05/05/04
  - Re: a number and two questions for bob Robert Hyatt 08:25:54 05/06/04
    - Re: a number and two questions for bob Gian-Carlo Pascutto 08:35:02 05/06/04
      - Re: a number and two questions for bob Robert Hyatt 09:03:11 05/06/04
        
        Re: a number and two questions for bob Gian-Carlo Pascutto 09:07:47 05/06/04
        
        Re: a number and two questions for bob Robert Hyatt 11:07:02 05/06/04
        
        Re: a number and two questions for bob Robert Hyatt 11:14:43 05/06/04

This page took 0.01 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.