Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Differences in speedup

Author: Eugene Nalimov

Date: 22:45:57 05/08/04

Go up one level in this thread


Bob, please relax. Just ignore Vincent as I do.

Thanks,
Eugene

On May 09, 2004 at 00:05:47, Robert Hyatt wrote:

>On May 07, 2004 at 19:27:00, Vincent Diepeveen wrote:
>
>>On May 07, 2004 at 11:53:29, Andreas Guettinger wrote:
>>
>>>On May 07, 2004 at 04:38:00, Vincent Diepeveen wrote:
>>>
>>>>On May 06, 2004 at 19:03:48, martin fierz wrote:
>>>>
>>>>>aloha!
>>>>>
>>>>>bob posted some crafty logfiles running a 24-position test set on his ftp site
>>>>>(for anyone else crazy enough to repeat what i did:
>>>>>ftp.cis.uab.edu/pub/hyatt/smpdata)
>>>>>
>>>>>these are logfiles of crafty running as single CPU, dual, or quad; on opterons.
>>>>>i took the last completed ply on the single CPU set for each position (marked by
>>>>>-> in the logfile, i hope...), wrote down the time to complete this ply, and did
>>>>>this for all logfiles. there are 9 of these, 4 repeats for 2 and 4 CPUs. i
>>>>>computed the speedup for time-to-finish-ply-X for each of the multi-CPU runs
>>>>>with the following results:
>>>>>
>>>>>2 CPUs:
>>>>>1.961 +- 0.093
>>>>>1.888 +- 0.074
>>>>>1.846 +- 0.078
>>>>>1.763 +- 0.084
>>>>>
>>>>>4 CPUs:
>>>>>3.15 +- 0.15
>>>>>3.29 +- 0.20
>>>>>3.06 +- 0.12
>>>>>3.19 +- 0.13
>>>>>
>>>>>now, is there any meaning to this, and if yes, what?
>>>>>
>>>>>point #1 to make is that the numbers here are mutually consistent with each
>>>>>other, given the error margins quoted. which should show those skeptical of this
>>>>>statistical approach that it makes sense to do it this way, rather than to just
>>>>>write "i measured speedup 3.1".
>>>>>
>>>>>point #2 is that the speedup on 4 CPUs on average is 3.17 in this test, which
>>>>>might be one point for bob in the duel with vincent; although i suspect that the
>>>>>speedup depends on the hardware architecture - i will leave this question to the
>>>>>parallel computing experts though...
>>>>
>>>>Bob has tested the SMP version 1 cpu versus SMP version 2 or 4 cpus. The single
>>>>cpu version of crafty is just hardly existing because of a stupid thread pointer
>>>>which is a constant. Optimizing that crafty is 5% faster for sure in time single
>>>>cpu at opteron.
>>>
>>>I don't understand that. What does that mean?
>>>
>>>regards
>>>Andy
>>
>>In very simple words, to run parallel you first slow down your program.
>>Then the slowed down program gets when compared to the slowed down program the
>>speedups that Bob reports.
>
>That is crap, but that is yet another bit of disinformation that we can work
>around.  Assume that I could speed Crafty up by 5% without the tree pointer,
>something that is not possible.  But it is an assumption.  Here is the BK data I
>posted last night for two processors.  First as I computed last night, then
>computed after reducing the 1cpu times by 5%, just to see what would happen, not
>because I believe 5% is a reasonable number on the opteron.
>
>2cpu normal data:
>
>  2   34     51/0.67   25/1.36   28/1.21   35/0.97
>  3  139     51/2.73   45/3.09   58/2.40   74/1.88
>  4  154    106/1.45   84/1.83   83/1.86   84/1.83
>  5  175    112/1.56  105/1.67  114/1.54  109/1.61
>  6  145     69/2.10   70/2.07   74/1.96   70/2.07
>  7  110     65/1.69   71/1.55  112/0.98  111/0.99
>  8  115     60/1.92   66/1.74   58/1.98   60/1.92
>  9  171    101/1.69  104/1.64  101/1.69   94/1.82
> 10   95     45/2.11   43/2.21   38/2.50   41/2.32
> 11   97     35/2.77   55/1.76   52/1.87   56/1.73
> 12  147    100/1.47  113/1.30  107/1.37  114/1.29
> 13  153    108/1.42   98/1.56   79/1.94   83/1.84
> 14  137     75/1.83   88/1.56   81/1.69   87/1.57
> 15   86     42/2.05   42/2.05   41/2.10   42/2.05
> 16  141     78/1.81   78/1.81   78/1.81   77/1.83
> 17   38     25/1.52   21/1.81   23/1.65   21/1.81
> 18  154     95/1.62   60/2.57   91/1.69   72/2.14
> 19  128     67/1.91   57/2.25   65/1.97   58/2.21
> 20   96     66/1.45   63/1.52   66/1.45   66/1.45
> 21  123     70/1.76   70/1.76   67/1.84   74/1.66
> 22   98     46/2.13   48/2.04   45/2.18   45/2.18
> 23  137     62/2.21   61/2.25  106/1.29  106/1.29
> 24   87     45/1.93   43/2.02   39/2.23   44/1.98
>average SU      1.82      1.89      1.79      1.76
>
>
>OK.  Now after fudging the 1cpu time for your bogus 5% number:
>
>  2   32     51/0.63   25/1.28   28/1.14   35/0.91
>  3  132     51/2.59   45/2.93   58/2.28   74/1.78
>  4  146    106/1.38   84/1.74   83/1.76   84/1.74
>  5  166    112/1.48  105/1.58  114/1.46  109/1.52
>  6  137     69/1.99   70/1.96   74/1.85   70/1.96
>  7  104     65/1.60   71/1.46  112/0.93  111/0.94
>  8  109     60/1.82   66/1.65   58/1.88   60/1.82
>  9  162    101/1.60  104/1.56  101/1.60   94/1.72
> 10   90     45/2.00   43/2.09   38/2.37   41/2.20
> 11   92     35/2.63   55/1.67   52/1.77   56/1.64
> 12  139    100/1.39  113/1.23  107/1.30  114/1.22
> 13  145    108/1.34   98/1.48   79/1.84   83/1.75
> 14  130     75/1.73   88/1.48   81/1.60   87/1.49
> 15   81     42/1.93   42/1.93   41/1.98   42/1.93
> 16  133     78/1.71   78/1.71   78/1.71   77/1.73
> 17   36     25/1.44   21/1.71   23/1.57   21/1.71
> 18  146     95/1.54   60/2.43   91/1.60   72/2.03
> 19  121     67/1.81   57/2.12   65/1.86   58/2.09
> 20   91     66/1.38   63/1.44   66/1.38   66/1.38
> 21  116     70/1.66   70/1.66   67/1.73   74/1.57
> 22   93     46/2.02   48/1.94   45/2.07   45/2.07
> 23  130     62/2.10   61/2.13  106/1.23  106/1.23
> 24   82     45/1.82   43/1.91   39/2.10   44/1.86
>average SU      1.72      1.79      1.70      1.66
>
>
>Do you like those numbers better??
>
>They are still right in line with my formula, and this is for the BK test data.
>I don't have the raw times in a useful form for the CB positions, where the
>speedup was actually a fair bit better.
>
>Here is 4cpu with normal data:
>
>  2   34     26/1.31   27/1.26   18/1.89   18/1.89
>  3  139     54/2.57   29/4.79   75/1.85   75/1.85
>  4  154     49/3.14   46/3.35   52/2.96   52/2.96
>  5  175     71/2.46   53/3.30   58/3.02   58/3.02
>  6  145     34/4.26   33/4.39   51/2.84   51/2.84
>  7  110     61/1.80   73/1.51   43/2.56   43/2.56
>  8  115     37/3.11   39/2.95   35/3.29   35/3.29
>  9  171     67/2.55   37/4.62   41/4.17   41/4.17
> 10   95     42/2.26   28/3.39   40/2.38   40/2.38
> 11   97     30/3.23   27/3.59   32/3.03   32/3.03
> 12  147     77/1.91   55/2.67   63/2.33   63/2.33
> 13  153     55/2.78   56/2.73   40/3.83   40/3.83
> 14  137     47/2.91   42/3.26   39/3.51   39/3.51
> 15   86     26/3.31   26/3.31   25/3.44   25/3.44
> 16  141     51/2.76   50/2.82   47/3.00   47/3.00
> 17   38     12/3.17   13/2.92   13/2.92   13/2.92
> 18  154     50/3.08   50/3.08   79/1.95   79/1.95
> 19  128     38/3.37   38/3.37   30/4.27   30/4.27
> 20   96     30/3.20   36/2.67   25/3.84   25/3.84
> 21  123     42/2.93   44/2.80   43/2.86   43/2.86
> 22   98     24/4.08   24/4.08   25/3.92   25/3.92
> 23  137     76/1.80   61/2.25   46/2.98   46/2.98
> 24   87     31/2.81   32/2.72   33/2.64   33/2.64
>average SU      2.82      3.12      3.02      3.02
>
>
>Here is the same table with the 1cpu time reduced by your mythical 5%.
>
>  2   32     26/1.23   27/1.19   18/1.78   18/1.78
>  3  132     54/2.44   29/4.55   75/1.76   75/1.76
>  4  146     49/2.98   46/3.17   52/2.81   52/2.81
>  5  166     71/2.34   53/3.13   58/2.86   58/2.86
>  6  137     34/4.03   33/4.15   51/2.69   51/2.69
>  7  104     61/1.70   73/1.42   43/2.42   43/2.42
>  8  109     37/2.95   39/2.79   35/3.11   35/3.11
>  9  162     67/2.42   37/4.38   41/3.95   41/3.95
> 10   90     42/2.14   28/3.21   40/2.25   40/2.25
> 11   92     30/3.07   27/3.41   32/2.88   32/2.88
> 12  139     77/1.81   55/2.53   63/2.21   63/2.21
> 13  145     55/2.64   56/2.59   40/3.62   40/3.62
> 14  130     47/2.77   42/3.10   39/3.33   39/3.33
> 15   81     26/3.12   26/3.12   25/3.24   25/3.24
> 16  133     51/2.61   50/2.66   47/2.83   47/2.83
> 17   36     12/3.00   13/2.77   13/2.77   13/2.77
> 18  146     50/2.92   50/2.92   79/1.85   79/1.85
> 19  121     38/3.18   38/3.18   30/4.03   30/4.03
> 20   91     30/3.03   36/2.53   25/3.64   25/3.64
> 21  116     42/2.76   44/2.64   43/2.70   43/2.70
> 22   93     24/3.88   24/3.88   25/3.72   25/3.72
> 23  130     76/1.71   61/2.13   46/2.83   46/2.83
> 24   82     31/2.65   32/2.56   33/2.48   33/2.48
>average SU      2.67      2.96      2.86      2.86
>
>2.7, 3.0, 2.9, 2.9.  Do you like those numbers better?  Do they prove your
>"point" whatever that might be?  The numbers drop by just over .1 on each test.
>The CB positions would average 3.0 rather than 3.1 as calculated by Martin.
>
>Or, perhaps, this is all nonsense and I should reduce them by 20%.  IE pick a
>number to get the speedup down to where you want it.  Or use the real number
>which used to be about 3% on Intel, probably less on opteron with more
>registers.  And just deal with a number you want to pretend I can't possibly
>reach...
>
>Your choice.
>
>Real data?  Or your imaginary stuff?
>
>I prefer "real".
>
>
>
>>
>>However this is not fair.
>>
>>In diep i just compare the single cpu version versus the parallel version of
>>diep.
>
>What are your speedup numbers?  Where is the data?
>
>
>>
>>Other good examples of unfair compares are what the Chrilly donninger is posting
>>about hydra.
>>
>>Hydra does not use hashtables last 6 plies. 3 ply not in hardware and 3 ply not
>>in software.
>>
>>He compares 1 cpu not doing last 6 plies in hardware versus 16 cpu's not doing
>>last 6 ply in hardware.
>>
>>That is not fair however, the *only* reason to not use the hashtable the last 3
>>ply in software is because that would not run parallel well.
>>
>>However, single cpu it does run well using hashtable there.
>>
>
>
>So?  Suppose there was something that could be done in the parallel version but
>not in the serial version.  Is it fair to compare?  Or should the serial version
>be modified the same way to look worse?
>
>You are wasting time worrying about some mythical speedup number.  The rest of
>us just care "how much does another processor help?"
>
>
>
>
>
>>This is a very common trick in computerchess and some are very bad in this. Like
>>cilkchess was slowed down 40 times in speed. Reduced from like 200k nps to 5k
>>nps in order to run parallel better.
>>
>>Then it shows up with 500 processors somewhere or even in 1995 it showed up at
>>like 1800 processors.
>>
>>But it is losing somewhere a factor 40 to start with.
>>
>>Is it fair to compare a slowed down program versus n processors?
>>
>>I do not think so. I find it very bad compare.
>
>At that extreme, perhaps.  But you always suppose fraud.  With no evidence.  Do
>you _know_ what they did in slowing it down?  Do you really know if they slowed
>it down that much?  It doesn't sound reasonable.  It smells of speculation and
>guessing.
>
>
>
>>
>>I also can get a much better speedup with diep when slowing it down first.
>
>Just show _any_ numbers.  Anything is better than what you have shown so far...
>Even if it is bogus...



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.