Author: Matt Taylor
Date: 17:10:45 12/22/02
Go up one level in this thread
On December 22, 2002 at 14:55:31, Walter Faxon wrote:
>Hi, Matt.
>
>In your latest bitscan.c, the routine BitSearchReset_IntMatt(), there is a
>sequence:
>
> ...
> mov ecx, eax
> shl ecx, 16
> add eax, ecx
> sub al, ah
> mov ecx, DWORD PTR [bb] <=== whaz dis doin' heah?
> and eax, 0xFF
> mov al, [LSB_64_table+eax-51]
> jmp ScanAgain
> ...
>
>That's all the references to ecx, so the load looks like an artifact. Haven't
>checked the other code, but my compiler also sometimes does silly things like
>this. (One might expect it to affect the timings... :)
>
>Another suggestion: once you've got the bit number, do something with it. A
>little more realistic.
>
>By the way, thanks again for looking into this problem in so much detail. If we
>can't scan bits efficiently, it won't be due to your lack of effort!
>
>-- Walter
Very strange. I have no idea how it got there or what I had intended by it. I
removed it in the latest version of the bitscan benchmark program. I might try
to rewrite that routine, or at least to sensibly comment it. I did too much P6
assembly, so I rearrange instructions (so that registers retire before they are
needed again), and it's more difficult to accurately explain the semantics of
the out-of-order algorithms.
I've added several tests using your 32-bit routine. It actually works a little
better, I think. (It also helps that the table is 20 bytes smaller -- but that's
mostly irrelevant.) I also added the "naive" method which looks roughly like
this:
int shift_scan(bitboard bb)
{
index = 0;
while(bb != 0 && (bb & 1) == 0)
{
index++;
bb >>= 1;
}
return index;
}
One other interesting possibility opens up due to using a 32-bit scan routine
like that: parallelism! The MMX routine performs poorly when applied to the
64-bit algorithm because it is difficult to fold at the end. I'm working on a
32-bit version. Initially it was -almost- the fastest bitscan yet, but then I
realized that I had a bug. I can do the bitscans in parallel, but I can't
actually clear both bits -- only the one I return. The current code ends up
throwing a bit away, cutting the time in half. After I fix the bug, we'll see
how it changes things.
You know, it would also be interesting also to have a routine return two bits
and do some unrolling. It would make MMX invaluable as you could theoretically
grab 8 bits at a time.
-Matt
This page took 0.01 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.