Author: Eugene Nalimov
Date: 22:45:57 05/08/04
Go up one level in this thread
Bob, please relax. Just ignore Vincent as I do. Thanks, Eugene On May 09, 2004 at 00:05:47, Robert Hyatt wrote: >On May 07, 2004 at 19:27:00, Vincent Diepeveen wrote: > >>On May 07, 2004 at 11:53:29, Andreas Guettinger wrote: >> >>>On May 07, 2004 at 04:38:00, Vincent Diepeveen wrote: >>> >>>>On May 06, 2004 at 19:03:48, martin fierz wrote: >>>> >>>>>aloha! >>>>> >>>>>bob posted some crafty logfiles running a 24-position test set on his ftp site >>>>>(for anyone else crazy enough to repeat what i did: >>>>>ftp.cis.uab.edu/pub/hyatt/smpdata) >>>>> >>>>>these are logfiles of crafty running as single CPU, dual, or quad; on opterons. >>>>>i took the last completed ply on the single CPU set for each position (marked by >>>>>-> in the logfile, i hope...), wrote down the time to complete this ply, and did >>>>>this for all logfiles. there are 9 of these, 4 repeats for 2 and 4 CPUs. i >>>>>computed the speedup for time-to-finish-ply-X for each of the multi-CPU runs >>>>>with the following results: >>>>> >>>>>2 CPUs: >>>>>1.961 +- 0.093 >>>>>1.888 +- 0.074 >>>>>1.846 +- 0.078 >>>>>1.763 +- 0.084 >>>>> >>>>>4 CPUs: >>>>>3.15 +- 0.15 >>>>>3.29 +- 0.20 >>>>>3.06 +- 0.12 >>>>>3.19 +- 0.13 >>>>> >>>>>now, is there any meaning to this, and if yes, what? >>>>> >>>>>point #1 to make is that the numbers here are mutually consistent with each >>>>>other, given the error margins quoted. which should show those skeptical of this >>>>>statistical approach that it makes sense to do it this way, rather than to just >>>>>write "i measured speedup 3.1". >>>>> >>>>>point #2 is that the speedup on 4 CPUs on average is 3.17 in this test, which >>>>>might be one point for bob in the duel with vincent; although i suspect that the >>>>>speedup depends on the hardware architecture - i will leave this question to the >>>>>parallel computing experts though... >>>> >>>>Bob has tested the SMP version 1 cpu versus SMP version 2 or 4 cpus. The single >>>>cpu version of crafty is just hardly existing because of a stupid thread pointer >>>>which is a constant. Optimizing that crafty is 5% faster for sure in time single >>>>cpu at opteron. >>> >>>I don't understand that. What does that mean? >>> >>>regards >>>Andy >> >>In very simple words, to run parallel you first slow down your program. >>Then the slowed down program gets when compared to the slowed down program the >>speedups that Bob reports. > >That is crap, but that is yet another bit of disinformation that we can work >around. Assume that I could speed Crafty up by 5% without the tree pointer, >something that is not possible. But it is an assumption. Here is the BK data I >posted last night for two processors. First as I computed last night, then >computed after reducing the 1cpu times by 5%, just to see what would happen, not >because I believe 5% is a reasonable number on the opteron. > >2cpu normal data: > > 2 34 51/0.67 25/1.36 28/1.21 35/0.97 > 3 139 51/2.73 45/3.09 58/2.40 74/1.88 > 4 154 106/1.45 84/1.83 83/1.86 84/1.83 > 5 175 112/1.56 105/1.67 114/1.54 109/1.61 > 6 145 69/2.10 70/2.07 74/1.96 70/2.07 > 7 110 65/1.69 71/1.55 112/0.98 111/0.99 > 8 115 60/1.92 66/1.74 58/1.98 60/1.92 > 9 171 101/1.69 104/1.64 101/1.69 94/1.82 > 10 95 45/2.11 43/2.21 38/2.50 41/2.32 > 11 97 35/2.77 55/1.76 52/1.87 56/1.73 > 12 147 100/1.47 113/1.30 107/1.37 114/1.29 > 13 153 108/1.42 98/1.56 79/1.94 83/1.84 > 14 137 75/1.83 88/1.56 81/1.69 87/1.57 > 15 86 42/2.05 42/2.05 41/2.10 42/2.05 > 16 141 78/1.81 78/1.81 78/1.81 77/1.83 > 17 38 25/1.52 21/1.81 23/1.65 21/1.81 > 18 154 95/1.62 60/2.57 91/1.69 72/2.14 > 19 128 67/1.91 57/2.25 65/1.97 58/2.21 > 20 96 66/1.45 63/1.52 66/1.45 66/1.45 > 21 123 70/1.76 70/1.76 67/1.84 74/1.66 > 22 98 46/2.13 48/2.04 45/2.18 45/2.18 > 23 137 62/2.21 61/2.25 106/1.29 106/1.29 > 24 87 45/1.93 43/2.02 39/2.23 44/1.98 >average SU 1.82 1.89 1.79 1.76 > > >OK. Now after fudging the 1cpu time for your bogus 5% number: > > 2 32 51/0.63 25/1.28 28/1.14 35/0.91 > 3 132 51/2.59 45/2.93 58/2.28 74/1.78 > 4 146 106/1.38 84/1.74 83/1.76 84/1.74 > 5 166 112/1.48 105/1.58 114/1.46 109/1.52 > 6 137 69/1.99 70/1.96 74/1.85 70/1.96 > 7 104 65/1.60 71/1.46 112/0.93 111/0.94 > 8 109 60/1.82 66/1.65 58/1.88 60/1.82 > 9 162 101/1.60 104/1.56 101/1.60 94/1.72 > 10 90 45/2.00 43/2.09 38/2.37 41/2.20 > 11 92 35/2.63 55/1.67 52/1.77 56/1.64 > 12 139 100/1.39 113/1.23 107/1.30 114/1.22 > 13 145 108/1.34 98/1.48 79/1.84 83/1.75 > 14 130 75/1.73 88/1.48 81/1.60 87/1.49 > 15 81 42/1.93 42/1.93 41/1.98 42/1.93 > 16 133 78/1.71 78/1.71 78/1.71 77/1.73 > 17 36 25/1.44 21/1.71 23/1.57 21/1.71 > 18 146 95/1.54 60/2.43 91/1.60 72/2.03 > 19 121 67/1.81 57/2.12 65/1.86 58/2.09 > 20 91 66/1.38 63/1.44 66/1.38 66/1.38 > 21 116 70/1.66 70/1.66 67/1.73 74/1.57 > 22 93 46/2.02 48/1.94 45/2.07 45/2.07 > 23 130 62/2.10 61/2.13 106/1.23 106/1.23 > 24 82 45/1.82 43/1.91 39/2.10 44/1.86 >average SU 1.72 1.79 1.70 1.66 > > >Do you like those numbers better?? > >They are still right in line with my formula, and this is for the BK test data. >I don't have the raw times in a useful form for the CB positions, where the >speedup was actually a fair bit better. > >Here is 4cpu with normal data: > > 2 34 26/1.31 27/1.26 18/1.89 18/1.89 > 3 139 54/2.57 29/4.79 75/1.85 75/1.85 > 4 154 49/3.14 46/3.35 52/2.96 52/2.96 > 5 175 71/2.46 53/3.30 58/3.02 58/3.02 > 6 145 34/4.26 33/4.39 51/2.84 51/2.84 > 7 110 61/1.80 73/1.51 43/2.56 43/2.56 > 8 115 37/3.11 39/2.95 35/3.29 35/3.29 > 9 171 67/2.55 37/4.62 41/4.17 41/4.17 > 10 95 42/2.26 28/3.39 40/2.38 40/2.38 > 11 97 30/3.23 27/3.59 32/3.03 32/3.03 > 12 147 77/1.91 55/2.67 63/2.33 63/2.33 > 13 153 55/2.78 56/2.73 40/3.83 40/3.83 > 14 137 47/2.91 42/3.26 39/3.51 39/3.51 > 15 86 26/3.31 26/3.31 25/3.44 25/3.44 > 16 141 51/2.76 50/2.82 47/3.00 47/3.00 > 17 38 12/3.17 13/2.92 13/2.92 13/2.92 > 18 154 50/3.08 50/3.08 79/1.95 79/1.95 > 19 128 38/3.37 38/3.37 30/4.27 30/4.27 > 20 96 30/3.20 36/2.67 25/3.84 25/3.84 > 21 123 42/2.93 44/2.80 43/2.86 43/2.86 > 22 98 24/4.08 24/4.08 25/3.92 25/3.92 > 23 137 76/1.80 61/2.25 46/2.98 46/2.98 > 24 87 31/2.81 32/2.72 33/2.64 33/2.64 >average SU 2.82 3.12 3.02 3.02 > > >Here is the same table with the 1cpu time reduced by your mythical 5%. > > 2 32 26/1.23 27/1.19 18/1.78 18/1.78 > 3 132 54/2.44 29/4.55 75/1.76 75/1.76 > 4 146 49/2.98 46/3.17 52/2.81 52/2.81 > 5 166 71/2.34 53/3.13 58/2.86 58/2.86 > 6 137 34/4.03 33/4.15 51/2.69 51/2.69 > 7 104 61/1.70 73/1.42 43/2.42 43/2.42 > 8 109 37/2.95 39/2.79 35/3.11 35/3.11 > 9 162 67/2.42 37/4.38 41/3.95 41/3.95 > 10 90 42/2.14 28/3.21 40/2.25 40/2.25 > 11 92 30/3.07 27/3.41 32/2.88 32/2.88 > 12 139 77/1.81 55/2.53 63/2.21 63/2.21 > 13 145 55/2.64 56/2.59 40/3.62 40/3.62 > 14 130 47/2.77 42/3.10 39/3.33 39/3.33 > 15 81 26/3.12 26/3.12 25/3.24 25/3.24 > 16 133 51/2.61 50/2.66 47/2.83 47/2.83 > 17 36 12/3.00 13/2.77 13/2.77 13/2.77 > 18 146 50/2.92 50/2.92 79/1.85 79/1.85 > 19 121 38/3.18 38/3.18 30/4.03 30/4.03 > 20 91 30/3.03 36/2.53 25/3.64 25/3.64 > 21 116 42/2.76 44/2.64 43/2.70 43/2.70 > 22 93 24/3.88 24/3.88 25/3.72 25/3.72 > 23 130 76/1.71 61/2.13 46/2.83 46/2.83 > 24 82 31/2.65 32/2.56 33/2.48 33/2.48 >average SU 2.67 2.96 2.86 2.86 > >2.7, 3.0, 2.9, 2.9. Do you like those numbers better? Do they prove your >"point" whatever that might be? The numbers drop by just over .1 on each test. >The CB positions would average 3.0 rather than 3.1 as calculated by Martin. > >Or, perhaps, this is all nonsense and I should reduce them by 20%. IE pick a >number to get the speedup down to where you want it. Or use the real number >which used to be about 3% on Intel, probably less on opteron with more >registers. And just deal with a number you want to pretend I can't possibly >reach... > >Your choice. > >Real data? Or your imaginary stuff? > >I prefer "real". > > > >> >>However this is not fair. >> >>In diep i just compare the single cpu version versus the parallel version of >>diep. > >What are your speedup numbers? Where is the data? > > >> >>Other good examples of unfair compares are what the Chrilly donninger is posting >>about hydra. >> >>Hydra does not use hashtables last 6 plies. 3 ply not in hardware and 3 ply not >>in software. >> >>He compares 1 cpu not doing last 6 plies in hardware versus 16 cpu's not doing >>last 6 ply in hardware. >> >>That is not fair however, the *only* reason to not use the hashtable the last 3 >>ply in software is because that would not run parallel well. >> >>However, single cpu it does run well using hashtable there. >> > > >So? Suppose there was something that could be done in the parallel version but >not in the serial version. Is it fair to compare? Or should the serial version >be modified the same way to look worse? > >You are wasting time worrying about some mythical speedup number. The rest of >us just care "how much does another processor help?" > > > > > >>This is a very common trick in computerchess and some are very bad in this. Like >>cilkchess was slowed down 40 times in speed. Reduced from like 200k nps to 5k >>nps in order to run parallel better. >> >>Then it shows up with 500 processors somewhere or even in 1995 it showed up at >>like 1800 processors. >> >>But it is losing somewhere a factor 40 to start with. >> >>Is it fair to compare a slowed down program versus n processors? >> >>I do not think so. I find it very bad compare. > >At that extreme, perhaps. But you always suppose fraud. With no evidence. Do >you _know_ what they did in slowing it down? Do you really know if they slowed >it down that much? It doesn't sound reasonable. It smells of speculation and >guessing. > > > >> >>I also can get a much better speedup with diep when slowing it down first. > >Just show _any_ numbers. Anything is better than what you have shown so far... >Even if it is bogus...
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.