Author: Matt Taylor
Date: 01:03:57 01/07/03
Go up one level in this thread
On January 07, 2003 at 01:29:12, Walter Faxon wrote: >Hi, Matt. > >I haven't been able to completely follow your code (yet!), but there does seem >to be one tiny bug, here marked by "<==". > >Also, glad you could use some of my "yabs()" logic -- if that's where you got >it! :) > >-- Walter > > >On January 06, 2003 at 13:12:24, Matt Taylor wrote: > ><snip> > >> ; Note, instructions dispatching in the same cycle are grouped. >> ; Note preservation of ebx/esi. >> ; Note: this routine tailored to Crafty's bit ordering! >> >> push esi >> mov esi, DWORD PTR [bb] >> xor eax, eax >> >> mov ecx, DWORD PTR [bb+4] >> xor ebx, ebx ; <== ebx zeroed... >> test esi, esi >> >> ; 2 cycles >> cmovz esi, ecx >> push ebx ; <== ...prior its preservation >> setz al >> mov ecx, esi >> neg esi > ><snip remaining code, etc.> > >>-Matt Oops! Good point. I tested to see that it reproduced the same answer, but I didn't check register preservation. A little rearranging should fix that. The code that I'm working with doesn't suffer that deficiency; that was just an error I made when revising it (quickly) for Crafty. However, it has come to my attention that the table-based version, though far less efficient, is much faster. It is some ~6 cycles faster, in fact. It is even faster than the bsf instruction. Since Hammer's bsf can only get slower (from what I read), it seems this is a win over bsf on any AMD chip. I am a bit annoyed. I thought I had reduced the cycle count to 13, but then I made the discovery (1) that there is some Athlon stall rule that I am not familiar with and (2) that I was flipping index bit 4 with no means to flip it back. Reading Eugene's code earlier reminded me of the tricks that the carry flag can be used for. It can only be applied to index bit 4 because I can't figure out a way to get the carry flag set on all the other comparisons. Anyway, I have to compliment the carry flag to get a correct i5, and it brings the cycle count to 14. Two other stalls that I'll research tomorrow also appear...sigh. One other comment on the byte registers with setcc -- I had forgotten about the high byte registers until one of your posts reminded me of them. However, using them incurs penalties, and it's not worth trying. I have not yet tried writing your other code. I think it can be implemented more efficiently using only 32-bit arithmetic and cmovcc. -Matt
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.