Author: Dieter Buerssner
Date: 14:54:50 07/06/03
Go up one level in this thread
On July 06, 2003 at 15:35:34, Gerd Isenberg wrote:
>But i guess for some strange reasons this speedup only occurs in this loop.
>May be due to some pipelength or microcode alignment reason some internal
>hyperthreading like unrolling occurs. Using all pipes perfectly, processing two
>loop bodies simultaniously with "different" register sets?
This gave me to the following idea. Try a loop of rand(), and add a variable
number of noops (I used xor eax, eax, 1 byte. One could try other variations).
The following source is rather boring, but results look interesting:
MSVC -Ox2 -Ob2 -G6 -Gr -GF
randnoop0 13.208
randnoop1 11.978
randnoop2 12.197
randnoop3 12.508
randnoop4 13.009
randnoop5 12.428
randnoop6 12.458
randnoop7 11.376
randnoop8 11.737
randnoop9 11.737
randnoop10 8.792
randnoop11 9.554
randnoop12 9.464
randnoop13 9.083
randnoop14 9.474
randnoop15 10.024
randnoop16 8.342
randnoop17 8.272
randnoop17 (calling rand and doing 17 xor eax,eax) is almost twice as fast, as
just calling rand. (Still not the 7.x seconds, that was used by the "overhead"
of omid_abs and calculating the sum). I checked the assembly fast, and
everything looks normal, and comparable.
The MSVC library rand is more or less (a linear congruential pseudo random
number generator, with a power of 2 modulus - this typically makes this sort of
PRNG rather bad. The deficience is made up a bit with the right shift):
return ((state = state * CONST_1 + CONST_2) >> 16) & 0x7fff;
state is a static variable. CONST_1 is 214013, so no fast optimization of the
multiplication by shift/lea tricks is possible.
Regards,
Dieter
This page took 0.01 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.