Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Bit Scan Timings -- possible artifact?

Author: Matt Taylor

Date: 17:10:45 12/22/02

Go up one level in this thread


On December 22, 2002 at 14:55:31, Walter Faxon wrote:

>Hi, Matt.
>
>In your latest bitscan.c, the routine BitSearchReset_IntMatt(), there is a
>sequence:
>
>    ...
>    mov    ecx, eax
>    shl    ecx, 16
>    add    eax, ecx
>    sub    al, ah
>    mov    ecx, DWORD PTR [bb]          <=== whaz dis doin' heah?
>    and    eax, 0xFF
>    mov    al, [LSB_64_table+eax-51]
>    jmp    ScanAgain
>    ...
>
>That's all the references to ecx, so the load looks like an artifact.  Haven't
>checked the other code, but my compiler also sometimes does silly things like
>this.  (One might expect it to affect the timings... :)
>
>Another suggestion:  once you've got the bit number, do something with it.  A
>little more realistic.
>
>By the way, thanks again for looking into this problem in so much detail.  If we
>can't scan bits efficiently, it won't be due to your lack of effort!
>
>-- Walter

Very strange. I have no idea how it got there or what I had intended by it. I
removed it in the latest version of the bitscan benchmark program. I might try
to rewrite that routine, or at least to sensibly comment it. I did too much P6
assembly, so I rearrange instructions (so that registers retire before they are
needed again), and it's more difficult to accurately explain the semantics of
the out-of-order algorithms.

I've added several tests using your 32-bit routine. It actually works a little
better, I think. (It also helps that the table is 20 bytes smaller -- but that's
mostly irrelevant.) I also added the "naive" method which looks roughly like
this:

int shift_scan(bitboard bb)
{
    index = 0;
    while(bb != 0 && (bb & 1) == 0)
    {
        index++;
        bb >>= 1;
    }

    return index;
}

One other interesting possibility opens up due to using a 32-bit scan routine
like that: parallelism! The MMX routine performs poorly when applied to the
64-bit algorithm because it is difficult to fold at the end. I'm working on a
32-bit version. Initially it was -almost- the fastest bitscan yet, but then I
realized that I had a bug. I can do the bitscans in parallel, but I can't
actually clear both bits -- only the one I return. The current code ends up
throwing a bit away, cutting the time in half. After I fix the bug, we'll see
how it changes things.

You know, it would also be interesting also to have a routine return two bits
and do some unrolling. It would make MMX invaluable as you could theoretically
grab 8 bits at a time.

-Matt



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.