Author: Matt Taylor
Date: 17:10:45 12/22/02
Go up one level in this thread
On December 22, 2002 at 14:55:31, Walter Faxon wrote: >Hi, Matt. > >In your latest bitscan.c, the routine BitSearchReset_IntMatt(), there is a >sequence: > > ... > mov ecx, eax > shl ecx, 16 > add eax, ecx > sub al, ah > mov ecx, DWORD PTR [bb] <=== whaz dis doin' heah? > and eax, 0xFF > mov al, [LSB_64_table+eax-51] > jmp ScanAgain > ... > >That's all the references to ecx, so the load looks like an artifact. Haven't >checked the other code, but my compiler also sometimes does silly things like >this. (One might expect it to affect the timings... :) > >Another suggestion: once you've got the bit number, do something with it. A >little more realistic. > >By the way, thanks again for looking into this problem in so much detail. If we >can't scan bits efficiently, it won't be due to your lack of effort! > >-- Walter Very strange. I have no idea how it got there or what I had intended by it. I removed it in the latest version of the bitscan benchmark program. I might try to rewrite that routine, or at least to sensibly comment it. I did too much P6 assembly, so I rearrange instructions (so that registers retire before they are needed again), and it's more difficult to accurately explain the semantics of the out-of-order algorithms. I've added several tests using your 32-bit routine. It actually works a little better, I think. (It also helps that the table is 20 bytes smaller -- but that's mostly irrelevant.) I also added the "naive" method which looks roughly like this: int shift_scan(bitboard bb) { index = 0; while(bb != 0 && (bb & 1) == 0) { index++; bb >>= 1; } return index; } One other interesting possibility opens up due to using a 32-bit scan routine like that: parallelism! The MMX routine performs poorly when applied to the 64-bit algorithm because it is difficult to fold at the end. I'm working on a 32-bit version. Initially it was -almost- the fastest bitscan yet, but then I realized that I had a bug. I can do the bitscans in parallel, but I can't actually clear both bits -- only the one I return. The current code ends up throwing a bit away, cutting the time in half. After I fix the bug, we'll see how it changes things. You know, it would also be interesting also to have a routine return two bits and do some unrolling. It would make MMX invaluable as you could theoretically grab 8 bits at a time. -Matt
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.