Author: Robert Hyatt
Date: 21:05:47 05/08/04
Go up one level in this thread
On May 07, 2004 at 19:27:00, Vincent Diepeveen wrote: >On May 07, 2004 at 11:53:29, Andreas Guettinger wrote: > >>On May 07, 2004 at 04:38:00, Vincent Diepeveen wrote: >> >>>On May 06, 2004 at 19:03:48, martin fierz wrote: >>> >>>>aloha! >>>> >>>>bob posted some crafty logfiles running a 24-position test set on his ftp site >>>>(for anyone else crazy enough to repeat what i did: >>>>ftp.cis.uab.edu/pub/hyatt/smpdata) >>>> >>>>these are logfiles of crafty running as single CPU, dual, or quad; on opterons. >>>>i took the last completed ply on the single CPU set for each position (marked by >>>>-> in the logfile, i hope...), wrote down the time to complete this ply, and did >>>>this for all logfiles. there are 9 of these, 4 repeats for 2 and 4 CPUs. i >>>>computed the speedup for time-to-finish-ply-X for each of the multi-CPU runs >>>>with the following results: >>>> >>>>2 CPUs: >>>>1.961 +- 0.093 >>>>1.888 +- 0.074 >>>>1.846 +- 0.078 >>>>1.763 +- 0.084 >>>> >>>>4 CPUs: >>>>3.15 +- 0.15 >>>>3.29 +- 0.20 >>>>3.06 +- 0.12 >>>>3.19 +- 0.13 >>>> >>>>now, is there any meaning to this, and if yes, what? >>>> >>>>point #1 to make is that the numbers here are mutually consistent with each >>>>other, given the error margins quoted. which should show those skeptical of this >>>>statistical approach that it makes sense to do it this way, rather than to just >>>>write "i measured speedup 3.1". >>>> >>>>point #2 is that the speedup on 4 CPUs on average is 3.17 in this test, which >>>>might be one point for bob in the duel with vincent; although i suspect that the >>>>speedup depends on the hardware architecture - i will leave this question to the >>>>parallel computing experts though... >>> >>>Bob has tested the SMP version 1 cpu versus SMP version 2 or 4 cpus. The single >>>cpu version of crafty is just hardly existing because of a stupid thread pointer >>>which is a constant. Optimizing that crafty is 5% faster for sure in time single >>>cpu at opteron. >> >>I don't understand that. What does that mean? >> >>regards >>Andy > >In very simple words, to run parallel you first slow down your program. >Then the slowed down program gets when compared to the slowed down program the >speedups that Bob reports. That is crap, but that is yet another bit of disinformation that we can work around. Assume that I could speed Crafty up by 5% without the tree pointer, something that is not possible. But it is an assumption. Here is the BK data I posted last night for two processors. First as I computed last night, then computed after reducing the 1cpu times by 5%, just to see what would happen, not because I believe 5% is a reasonable number on the opteron. 2cpu normal data: 2 34 51/0.67 25/1.36 28/1.21 35/0.97 3 139 51/2.73 45/3.09 58/2.40 74/1.88 4 154 106/1.45 84/1.83 83/1.86 84/1.83 5 175 112/1.56 105/1.67 114/1.54 109/1.61 6 145 69/2.10 70/2.07 74/1.96 70/2.07 7 110 65/1.69 71/1.55 112/0.98 111/0.99 8 115 60/1.92 66/1.74 58/1.98 60/1.92 9 171 101/1.69 104/1.64 101/1.69 94/1.82 10 95 45/2.11 43/2.21 38/2.50 41/2.32 11 97 35/2.77 55/1.76 52/1.87 56/1.73 12 147 100/1.47 113/1.30 107/1.37 114/1.29 13 153 108/1.42 98/1.56 79/1.94 83/1.84 14 137 75/1.83 88/1.56 81/1.69 87/1.57 15 86 42/2.05 42/2.05 41/2.10 42/2.05 16 141 78/1.81 78/1.81 78/1.81 77/1.83 17 38 25/1.52 21/1.81 23/1.65 21/1.81 18 154 95/1.62 60/2.57 91/1.69 72/2.14 19 128 67/1.91 57/2.25 65/1.97 58/2.21 20 96 66/1.45 63/1.52 66/1.45 66/1.45 21 123 70/1.76 70/1.76 67/1.84 74/1.66 22 98 46/2.13 48/2.04 45/2.18 45/2.18 23 137 62/2.21 61/2.25 106/1.29 106/1.29 24 87 45/1.93 43/2.02 39/2.23 44/1.98 average SU 1.82 1.89 1.79 1.76 OK. Now after fudging the 1cpu time for your bogus 5% number: 2 32 51/0.63 25/1.28 28/1.14 35/0.91 3 132 51/2.59 45/2.93 58/2.28 74/1.78 4 146 106/1.38 84/1.74 83/1.76 84/1.74 5 166 112/1.48 105/1.58 114/1.46 109/1.52 6 137 69/1.99 70/1.96 74/1.85 70/1.96 7 104 65/1.60 71/1.46 112/0.93 111/0.94 8 109 60/1.82 66/1.65 58/1.88 60/1.82 9 162 101/1.60 104/1.56 101/1.60 94/1.72 10 90 45/2.00 43/2.09 38/2.37 41/2.20 11 92 35/2.63 55/1.67 52/1.77 56/1.64 12 139 100/1.39 113/1.23 107/1.30 114/1.22 13 145 108/1.34 98/1.48 79/1.84 83/1.75 14 130 75/1.73 88/1.48 81/1.60 87/1.49 15 81 42/1.93 42/1.93 41/1.98 42/1.93 16 133 78/1.71 78/1.71 78/1.71 77/1.73 17 36 25/1.44 21/1.71 23/1.57 21/1.71 18 146 95/1.54 60/2.43 91/1.60 72/2.03 19 121 67/1.81 57/2.12 65/1.86 58/2.09 20 91 66/1.38 63/1.44 66/1.38 66/1.38 21 116 70/1.66 70/1.66 67/1.73 74/1.57 22 93 46/2.02 48/1.94 45/2.07 45/2.07 23 130 62/2.10 61/2.13 106/1.23 106/1.23 24 82 45/1.82 43/1.91 39/2.10 44/1.86 average SU 1.72 1.79 1.70 1.66 Do you like those numbers better?? They are still right in line with my formula, and this is for the BK test data. I don't have the raw times in a useful form for the CB positions, where the speedup was actually a fair bit better. Here is 4cpu with normal data: 2 34 26/1.31 27/1.26 18/1.89 18/1.89 3 139 54/2.57 29/4.79 75/1.85 75/1.85 4 154 49/3.14 46/3.35 52/2.96 52/2.96 5 175 71/2.46 53/3.30 58/3.02 58/3.02 6 145 34/4.26 33/4.39 51/2.84 51/2.84 7 110 61/1.80 73/1.51 43/2.56 43/2.56 8 115 37/3.11 39/2.95 35/3.29 35/3.29 9 171 67/2.55 37/4.62 41/4.17 41/4.17 10 95 42/2.26 28/3.39 40/2.38 40/2.38 11 97 30/3.23 27/3.59 32/3.03 32/3.03 12 147 77/1.91 55/2.67 63/2.33 63/2.33 13 153 55/2.78 56/2.73 40/3.83 40/3.83 14 137 47/2.91 42/3.26 39/3.51 39/3.51 15 86 26/3.31 26/3.31 25/3.44 25/3.44 16 141 51/2.76 50/2.82 47/3.00 47/3.00 17 38 12/3.17 13/2.92 13/2.92 13/2.92 18 154 50/3.08 50/3.08 79/1.95 79/1.95 19 128 38/3.37 38/3.37 30/4.27 30/4.27 20 96 30/3.20 36/2.67 25/3.84 25/3.84 21 123 42/2.93 44/2.80 43/2.86 43/2.86 22 98 24/4.08 24/4.08 25/3.92 25/3.92 23 137 76/1.80 61/2.25 46/2.98 46/2.98 24 87 31/2.81 32/2.72 33/2.64 33/2.64 average SU 2.82 3.12 3.02 3.02 Here is the same table with the 1cpu time reduced by your mythical 5%. 2 32 26/1.23 27/1.19 18/1.78 18/1.78 3 132 54/2.44 29/4.55 75/1.76 75/1.76 4 146 49/2.98 46/3.17 52/2.81 52/2.81 5 166 71/2.34 53/3.13 58/2.86 58/2.86 6 137 34/4.03 33/4.15 51/2.69 51/2.69 7 104 61/1.70 73/1.42 43/2.42 43/2.42 8 109 37/2.95 39/2.79 35/3.11 35/3.11 9 162 67/2.42 37/4.38 41/3.95 41/3.95 10 90 42/2.14 28/3.21 40/2.25 40/2.25 11 92 30/3.07 27/3.41 32/2.88 32/2.88 12 139 77/1.81 55/2.53 63/2.21 63/2.21 13 145 55/2.64 56/2.59 40/3.62 40/3.62 14 130 47/2.77 42/3.10 39/3.33 39/3.33 15 81 26/3.12 26/3.12 25/3.24 25/3.24 16 133 51/2.61 50/2.66 47/2.83 47/2.83 17 36 12/3.00 13/2.77 13/2.77 13/2.77 18 146 50/2.92 50/2.92 79/1.85 79/1.85 19 121 38/3.18 38/3.18 30/4.03 30/4.03 20 91 30/3.03 36/2.53 25/3.64 25/3.64 21 116 42/2.76 44/2.64 43/2.70 43/2.70 22 93 24/3.88 24/3.88 25/3.72 25/3.72 23 130 76/1.71 61/2.13 46/2.83 46/2.83 24 82 31/2.65 32/2.56 33/2.48 33/2.48 average SU 2.67 2.96 2.86 2.86 2.7, 3.0, 2.9, 2.9. Do you like those numbers better? Do they prove your "point" whatever that might be? The numbers drop by just over .1 on each test. The CB positions would average 3.0 rather than 3.1 as calculated by Martin. Or, perhaps, this is all nonsense and I should reduce them by 20%. IE pick a number to get the speedup down to where you want it. Or use the real number which used to be about 3% on Intel, probably less on opteron with more registers. And just deal with a number you want to pretend I can't possibly reach... Your choice. Real data? Or your imaginary stuff? I prefer "real". > >However this is not fair. > >In diep i just compare the single cpu version versus the parallel version of >diep. What are your speedup numbers? Where is the data? > >Other good examples of unfair compares are what the Chrilly donninger is posting >about hydra. > >Hydra does not use hashtables last 6 plies. 3 ply not in hardware and 3 ply not >in software. > >He compares 1 cpu not doing last 6 plies in hardware versus 16 cpu's not doing >last 6 ply in hardware. > >That is not fair however, the *only* reason to not use the hashtable the last 3 >ply in software is because that would not run parallel well. > >However, single cpu it does run well using hashtable there. > So? Suppose there was something that could be done in the parallel version but not in the serial version. Is it fair to compare? Or should the serial version be modified the same way to look worse? You are wasting time worrying about some mythical speedup number. The rest of us just care "how much does another processor help?" >This is a very common trick in computerchess and some are very bad in this. Like >cilkchess was slowed down 40 times in speed. Reduced from like 200k nps to 5k >nps in order to run parallel better. > >Then it shows up with 500 processors somewhere or even in 1995 it showed up at >like 1800 processors. > >But it is losing somewhere a factor 40 to start with. > >Is it fair to compare a slowed down program versus n processors? > >I do not think so. I find it very bad compare. At that extreme, perhaps. But you always suppose fraud. With no evidence. Do you _know_ what they did in slowing it down? Do you really know if they slowed it down that much? It doesn't sound reasonable. It smells of speculation and guessing. > >I also can get a much better speedup with diep when slowing it down first. Just show _any_ numbers. Anything is better than what you have shown so far... Even if it is bogus...
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.