Author: Dieter Buerssner
Date: 14:54:50 07/06/03
Go up one level in this thread
On July 06, 2003 at 15:35:34, Gerd Isenberg wrote: >But i guess for some strange reasons this speedup only occurs in this loop. >May be due to some pipelength or microcode alignment reason some internal >hyperthreading like unrolling occurs. Using all pipes perfectly, processing two >loop bodies simultaniously with "different" register sets? This gave me to the following idea. Try a loop of rand(), and add a variable number of noops (I used xor eax, eax, 1 byte. One could try other variations). The following source is rather boring, but results look interesting: MSVC -Ox2 -Ob2 -G6 -Gr -GF randnoop0 13.208 randnoop1 11.978 randnoop2 12.197 randnoop3 12.508 randnoop4 13.009 randnoop5 12.428 randnoop6 12.458 randnoop7 11.376 randnoop8 11.737 randnoop9 11.737 randnoop10 8.792 randnoop11 9.554 randnoop12 9.464 randnoop13 9.083 randnoop14 9.474 randnoop15 10.024 randnoop16 8.342 randnoop17 8.272 randnoop17 (calling rand and doing 17 xor eax,eax) is almost twice as fast, as just calling rand. (Still not the 7.x seconds, that was used by the "overhead" of omid_abs and calculating the sum). I checked the assembly fast, and everything looks normal, and comparable. The MSVC library rand is more or less (a linear congruential pseudo random number generator, with a power of 2 modulus - this typically makes this sort of PRNG rather bad. The deficience is made up a bit with the right shift): return ((state = state * CONST_1 + CONST_2) >> 16) & 0x7fff; state is a static variable. CONST_1 is 214013, so no fast optimization of the multiplication by shift/lea tricks is possible. Regards, Dieter
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.