Author: Wylie Garvin
Date: 17:49:15 12/11/01
Go up one level in this thread
On December 11, 2001 at 20:33:04, Dann Corbit wrote: >On December 11, 2001 at 20:27:37, Wylie Garvin wrote: > >>Hi, >> >> I've noticed some posts where people mention doing things like this: >> >> BSF ecx,dword ptr [foo+4] >> ADD ecx,32 >> BSF ecx,dword ptr [foo] >> >> On the assumption that undefined --> unchanged for PIII, PIV. While I don't >>have any evidence to contradict that assumption, it seems like a bad idea to >>rely on undefined behaviour. > >You must be sure that the function is never called with all zero bits. > Dann, it sounds like you're agreeing with me. But I never used the word "function". My chess program doesn't contain any "functions". (The parts I've written so far don't contain any multiply instructions, either =) >> I point out the conditional CMOVcc instructions which have been supported >>since around the time of the PPro I think: >> >> BSF ecx,dword ptr [foo+4] >> ADD ecx,32 >> BSF eax,dword ptr [foo] >> CMOVZ eax,ecx >> >> This technique will work correctly even if eax happens to be trashed by the >>BSR instruction if no bits were set. >> >> Also I'd like to mention something that I use to speed up my attack >>calculations with BSF/BSR...I don't know if it will help you C/C++ programmers, >>but it's pretty cool in assembly. >> >> Notice that of the 8 rays originating in one half of the board, only 3 of them >>can possibly cross over to the other side of the board. So you can reduce that >>construction above to a single BSR/BSF in the other 5 cases. My assembly code >>first decides which half of the board the source square is in, then individually >>probes the ends of the ray. To ensure that rays are always terminated, before I >>start my attack calculations I build a temporary bitboard from the "all pieces" >>bitboard by ORing in the edges (0xFF818181,818181FF) and then clearing the bit >>for the source square. Now I need about 5 BSF/BSRs for rook/queen attacks and 6 >>for bishop/queen attacks. >> >> Also keep in mind that removing a conditional branch in the fashion shown >>above requires doing the computation for *both* halves of the branch. In this >>case it appears to be a win; but if extra computation is required to produce the >>bitboard, the cost could exceed the cost of a conditional branch. In general, >>if the branch is predictable (e.g. if the rays are almost always short, and so >>almost always stay on the same side of the board) it's likely better not to >>remove it. > >I have found that with the latest Intel compilers, the bit tricks no longer gain >any speed. Not the MMX tricks, not the BSF/BSR -- nothing. The ANSI C is just >as fast. Strike that -- faster. Heh heh. I respectfully submit that you can *always* produce faster assembly than any given C compiler. It's just a matter of how much effort you are willing to expend...if the compiler does a good job on 95% of the code, then writing more than 5% in assembly is only for ppl like me who think it's *fun*. I have the latest Intel compiler, and my only real gripe with it is that it generates fairly large code to get that speed. I like the aesthetic appeal of small assembly code. By writing my whole program in assembly I can make it small. Then I can go back and fix up the parts that VTune says are too slow.. =) wylie
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.