Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Differences in speedup

Author: Robert Hyatt

Date: 21:05:47 05/08/04

Go up one level in this thread


On May 07, 2004 at 19:27:00, Vincent Diepeveen wrote:

>On May 07, 2004 at 11:53:29, Andreas Guettinger wrote:
>
>>On May 07, 2004 at 04:38:00, Vincent Diepeveen wrote:
>>
>>>On May 06, 2004 at 19:03:48, martin fierz wrote:
>>>
>>>>aloha!
>>>>
>>>>bob posted some crafty logfiles running a 24-position test set on his ftp site
>>>>(for anyone else crazy enough to repeat what i did:
>>>>ftp.cis.uab.edu/pub/hyatt/smpdata)
>>>>
>>>>these are logfiles of crafty running as single CPU, dual, or quad; on opterons.
>>>>i took the last completed ply on the single CPU set for each position (marked by
>>>>-> in the logfile, i hope...), wrote down the time to complete this ply, and did
>>>>this for all logfiles. there are 9 of these, 4 repeats for 2 and 4 CPUs. i
>>>>computed the speedup for time-to-finish-ply-X for each of the multi-CPU runs
>>>>with the following results:
>>>>
>>>>2 CPUs:
>>>>1.961 +- 0.093
>>>>1.888 +- 0.074
>>>>1.846 +- 0.078
>>>>1.763 +- 0.084
>>>>
>>>>4 CPUs:
>>>>3.15 +- 0.15
>>>>3.29 +- 0.20
>>>>3.06 +- 0.12
>>>>3.19 +- 0.13
>>>>
>>>>now, is there any meaning to this, and if yes, what?
>>>>
>>>>point #1 to make is that the numbers here are mutually consistent with each
>>>>other, given the error margins quoted. which should show those skeptical of this
>>>>statistical approach that it makes sense to do it this way, rather than to just
>>>>write "i measured speedup 3.1".
>>>>
>>>>point #2 is that the speedup on 4 CPUs on average is 3.17 in this test, which
>>>>might be one point for bob in the duel with vincent; although i suspect that the
>>>>speedup depends on the hardware architecture - i will leave this question to the
>>>>parallel computing experts though...
>>>
>>>Bob has tested the SMP version 1 cpu versus SMP version 2 or 4 cpus. The single
>>>cpu version of crafty is just hardly existing because of a stupid thread pointer
>>>which is a constant. Optimizing that crafty is 5% faster for sure in time single
>>>cpu at opteron.
>>
>>I don't understand that. What does that mean?
>>
>>regards
>>Andy
>
>In very simple words, to run parallel you first slow down your program.
>Then the slowed down program gets when compared to the slowed down program the
>speedups that Bob reports.

That is crap, but that is yet another bit of disinformation that we can work
around.  Assume that I could speed Crafty up by 5% without the tree pointer,
something that is not possible.  But it is an assumption.  Here is the BK data I
posted last night for two processors.  First as I computed last night, then
computed after reducing the 1cpu times by 5%, just to see what would happen, not
because I believe 5% is a reasonable number on the opteron.

2cpu normal data:

  2   34     51/0.67   25/1.36   28/1.21   35/0.97
  3  139     51/2.73   45/3.09   58/2.40   74/1.88
  4  154    106/1.45   84/1.83   83/1.86   84/1.83
  5  175    112/1.56  105/1.67  114/1.54  109/1.61
  6  145     69/2.10   70/2.07   74/1.96   70/2.07
  7  110     65/1.69   71/1.55  112/0.98  111/0.99
  8  115     60/1.92   66/1.74   58/1.98   60/1.92
  9  171    101/1.69  104/1.64  101/1.69   94/1.82
 10   95     45/2.11   43/2.21   38/2.50   41/2.32
 11   97     35/2.77   55/1.76   52/1.87   56/1.73
 12  147    100/1.47  113/1.30  107/1.37  114/1.29
 13  153    108/1.42   98/1.56   79/1.94   83/1.84
 14  137     75/1.83   88/1.56   81/1.69   87/1.57
 15   86     42/2.05   42/2.05   41/2.10   42/2.05
 16  141     78/1.81   78/1.81   78/1.81   77/1.83
 17   38     25/1.52   21/1.81   23/1.65   21/1.81
 18  154     95/1.62   60/2.57   91/1.69   72/2.14
 19  128     67/1.91   57/2.25   65/1.97   58/2.21
 20   96     66/1.45   63/1.52   66/1.45   66/1.45
 21  123     70/1.76   70/1.76   67/1.84   74/1.66
 22   98     46/2.13   48/2.04   45/2.18   45/2.18
 23  137     62/2.21   61/2.25  106/1.29  106/1.29
 24   87     45/1.93   43/2.02   39/2.23   44/1.98
average SU      1.82      1.89      1.79      1.76


OK.  Now after fudging the 1cpu time for your bogus 5% number:

  2   32     51/0.63   25/1.28   28/1.14   35/0.91
  3  132     51/2.59   45/2.93   58/2.28   74/1.78
  4  146    106/1.38   84/1.74   83/1.76   84/1.74
  5  166    112/1.48  105/1.58  114/1.46  109/1.52
  6  137     69/1.99   70/1.96   74/1.85   70/1.96
  7  104     65/1.60   71/1.46  112/0.93  111/0.94
  8  109     60/1.82   66/1.65   58/1.88   60/1.82
  9  162    101/1.60  104/1.56  101/1.60   94/1.72
 10   90     45/2.00   43/2.09   38/2.37   41/2.20
 11   92     35/2.63   55/1.67   52/1.77   56/1.64
 12  139    100/1.39  113/1.23  107/1.30  114/1.22
 13  145    108/1.34   98/1.48   79/1.84   83/1.75
 14  130     75/1.73   88/1.48   81/1.60   87/1.49
 15   81     42/1.93   42/1.93   41/1.98   42/1.93
 16  133     78/1.71   78/1.71   78/1.71   77/1.73
 17   36     25/1.44   21/1.71   23/1.57   21/1.71
 18  146     95/1.54   60/2.43   91/1.60   72/2.03
 19  121     67/1.81   57/2.12   65/1.86   58/2.09
 20   91     66/1.38   63/1.44   66/1.38   66/1.38
 21  116     70/1.66   70/1.66   67/1.73   74/1.57
 22   93     46/2.02   48/1.94   45/2.07   45/2.07
 23  130     62/2.10   61/2.13  106/1.23  106/1.23
 24   82     45/1.82   43/1.91   39/2.10   44/1.86
average SU      1.72      1.79      1.70      1.66


Do you like those numbers better??

They are still right in line with my formula, and this is for the BK test data.
I don't have the raw times in a useful form for the CB positions, where the
speedup was actually a fair bit better.

Here is 4cpu with normal data:

  2   34     26/1.31   27/1.26   18/1.89   18/1.89
  3  139     54/2.57   29/4.79   75/1.85   75/1.85
  4  154     49/3.14   46/3.35   52/2.96   52/2.96
  5  175     71/2.46   53/3.30   58/3.02   58/3.02
  6  145     34/4.26   33/4.39   51/2.84   51/2.84
  7  110     61/1.80   73/1.51   43/2.56   43/2.56
  8  115     37/3.11   39/2.95   35/3.29   35/3.29
  9  171     67/2.55   37/4.62   41/4.17   41/4.17
 10   95     42/2.26   28/3.39   40/2.38   40/2.38
 11   97     30/3.23   27/3.59   32/3.03   32/3.03
 12  147     77/1.91   55/2.67   63/2.33   63/2.33
 13  153     55/2.78   56/2.73   40/3.83   40/3.83
 14  137     47/2.91   42/3.26   39/3.51   39/3.51
 15   86     26/3.31   26/3.31   25/3.44   25/3.44
 16  141     51/2.76   50/2.82   47/3.00   47/3.00
 17   38     12/3.17   13/2.92   13/2.92   13/2.92
 18  154     50/3.08   50/3.08   79/1.95   79/1.95
 19  128     38/3.37   38/3.37   30/4.27   30/4.27
 20   96     30/3.20   36/2.67   25/3.84   25/3.84
 21  123     42/2.93   44/2.80   43/2.86   43/2.86
 22   98     24/4.08   24/4.08   25/3.92   25/3.92
 23  137     76/1.80   61/2.25   46/2.98   46/2.98
 24   87     31/2.81   32/2.72   33/2.64   33/2.64
average SU      2.82      3.12      3.02      3.02


Here is the same table with the 1cpu time reduced by your mythical 5%.

  2   32     26/1.23   27/1.19   18/1.78   18/1.78
  3  132     54/2.44   29/4.55   75/1.76   75/1.76
  4  146     49/2.98   46/3.17   52/2.81   52/2.81
  5  166     71/2.34   53/3.13   58/2.86   58/2.86
  6  137     34/4.03   33/4.15   51/2.69   51/2.69
  7  104     61/1.70   73/1.42   43/2.42   43/2.42
  8  109     37/2.95   39/2.79   35/3.11   35/3.11
  9  162     67/2.42   37/4.38   41/3.95   41/3.95
 10   90     42/2.14   28/3.21   40/2.25   40/2.25
 11   92     30/3.07   27/3.41   32/2.88   32/2.88
 12  139     77/1.81   55/2.53   63/2.21   63/2.21
 13  145     55/2.64   56/2.59   40/3.62   40/3.62
 14  130     47/2.77   42/3.10   39/3.33   39/3.33
 15   81     26/3.12   26/3.12   25/3.24   25/3.24
 16  133     51/2.61   50/2.66   47/2.83   47/2.83
 17   36     12/3.00   13/2.77   13/2.77   13/2.77
 18  146     50/2.92   50/2.92   79/1.85   79/1.85
 19  121     38/3.18   38/3.18   30/4.03   30/4.03
 20   91     30/3.03   36/2.53   25/3.64   25/3.64
 21  116     42/2.76   44/2.64   43/2.70   43/2.70
 22   93     24/3.88   24/3.88   25/3.72   25/3.72
 23  130     76/1.71   61/2.13   46/2.83   46/2.83
 24   82     31/2.65   32/2.56   33/2.48   33/2.48
average SU      2.67      2.96      2.86      2.86

2.7, 3.0, 2.9, 2.9.  Do you like those numbers better?  Do they prove your
"point" whatever that might be?  The numbers drop by just over .1 on each test.
The CB positions would average 3.0 rather than 3.1 as calculated by Martin.

Or, perhaps, this is all nonsense and I should reduce them by 20%.  IE pick a
number to get the speedup down to where you want it.  Or use the real number
which used to be about 3% on Intel, probably less on opteron with more
registers.  And just deal with a number you want to pretend I can't possibly
reach...

Your choice.

Real data?  Or your imaginary stuff?

I prefer "real".



>
>However this is not fair.
>
>In diep i just compare the single cpu version versus the parallel version of
>diep.

What are your speedup numbers?  Where is the data?


>
>Other good examples of unfair compares are what the Chrilly donninger is posting
>about hydra.
>
>Hydra does not use hashtables last 6 plies. 3 ply not in hardware and 3 ply not
>in software.
>
>He compares 1 cpu not doing last 6 plies in hardware versus 16 cpu's not doing
>last 6 ply in hardware.
>
>That is not fair however, the *only* reason to not use the hashtable the last 3
>ply in software is because that would not run parallel well.
>
>However, single cpu it does run well using hashtable there.
>


So?  Suppose there was something that could be done in the parallel version but
not in the serial version.  Is it fair to compare?  Or should the serial version
be modified the same way to look worse?

You are wasting time worrying about some mythical speedup number.  The rest of
us just care "how much does another processor help?"





>This is a very common trick in computerchess and some are very bad in this. Like
>cilkchess was slowed down 40 times in speed. Reduced from like 200k nps to 5k
>nps in order to run parallel better.
>
>Then it shows up with 500 processors somewhere or even in 1995 it showed up at
>like 1800 processors.
>
>But it is losing somewhere a factor 40 to start with.
>
>Is it fair to compare a slowed down program versus n processors?
>
>I do not think so. I find it very bad compare.

At that extreme, perhaps.  But you always suppose fraud.  With no evidence.  Do
you _know_ what they did in slowing it down?  Do you really know if they slowed
it down that much?  It doesn't sound reasonable.  It smells of speculation and
guessing.



>
>I also can get a much better speedup with diep when slowing it down first.

Just show _any_ numbers.  Anything is better than what you have shown so far...
Even if it is bogus...



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.