Computer Chess Club Archives


Search

Terms

Messages

Subject: Self-Correction

Author: Matt Taylor

Date: 02:16:06 01/18/03

Go up one level in this thread


>64-bit shift in something like 3 cycles when count is < 32. Pentium 4 L1 cache
>latency -- 2 clocks. Athlon L1 cache latency -- 3 clocks.

I should have clarified the above. What will happen is the compiler will do a
32-bit shift with count & 31, then it will manually move the 32-bit result into
the upper-half if necessary. Optimal assembly as follows:

VC-style:
_asm
{
    xor    eax, eax
    mov    edx, 1

    shl    edx, cl
    test    ecx, 32
    mov    ecx, 0

    cmovnz    eax, edx
    cmovnz    edx, ecx
}

GCC-style:
asm("\txorl\t%%eax, %%eax\n
    "\tmovl\t$1, %%edx\n"

    "\tshll\t%%cl, %%edx\n"
    "\ttestl\t$32, %%ecx\n"
    "\tmovl\t$0, %%ecx\n"

    "\tcmovnzl\t%%edx, %%eax\n"
    "\tcmovnzl\t%%ecx, %%edx\n"
    : "=A" (index) : "c" (count), "A" (1));

Something like that...haven't tested...use at your own risk...yadda. That goes
particularly for the GCC-style code. I'm not overly familiar with AT&T syntax
for Intel.

-Matt



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.