Author: Matt Taylor
Date: 02:16:06 01/18/03
Go up one level in this thread
>64-bit shift in something like 3 cycles when count is < 32. Pentium 4 L1 cache >latency -- 2 clocks. Athlon L1 cache latency -- 3 clocks. I should have clarified the above. What will happen is the compiler will do a 32-bit shift with count & 31, then it will manually move the 32-bit result into the upper-half if necessary. Optimal assembly as follows: VC-style: _asm { xor eax, eax mov edx, 1 shl edx, cl test ecx, 32 mov ecx, 0 cmovnz eax, edx cmovnz edx, ecx } GCC-style: asm("\txorl\t%%eax, %%eax\n "\tmovl\t$1, %%edx\n" "\tshll\t%%cl, %%edx\n" "\ttestl\t$32, %%ecx\n" "\tmovl\t$0, %%ecx\n" "\tcmovnzl\t%%edx, %%eax\n" "\tcmovnzl\t%%ecx, %%edx\n" : "=A" (index) : "c" (count), "A" (1)); Something like that...haven't tested...use at your own risk...yadda. That goes particularly for the GCC-style code. I'm not overly familiar with AT&T syntax for Intel. -Matt
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.