Author: Matt Taylor
Date: 02:16:06 01/18/03
Go up one level in this thread
>64-bit shift in something like 3 cycles when count is < 32. Pentium 4 L1 cache
>latency -- 2 clocks. Athlon L1 cache latency -- 3 clocks.
I should have clarified the above. What will happen is the compiler will do a
32-bit shift with count & 31, then it will manually move the 32-bit result into
the upper-half if necessary. Optimal assembly as follows:
VC-style:
_asm
{
xor eax, eax
mov edx, 1
shl edx, cl
test ecx, 32
mov ecx, 0
cmovnz eax, edx
cmovnz edx, ecx
}
GCC-style:
asm("\txorl\t%%eax, %%eax\n
"\tmovl\t$1, %%edx\n"
"\tshll\t%%cl, %%edx\n"
"\ttestl\t$32, %%ecx\n"
"\tmovl\t$0, %%ecx\n"
"\tcmovnzl\t%%edx, %%eax\n"
"\tcmovnzl\t%%ecx, %%edx\n"
: "=A" (index) : "c" (count), "A" (1));
Something like that...haven't tested...use at your own risk...yadda. That goes
particularly for the GCC-style code. I'm not overly familiar with AT&T syntax
for Intel.
-Matt
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.